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The 2° Fractional Factorial Designs 
Part IL. 


G. E. P. Box ann J. S. HUNTER} 


Statistics Department, University of Wisconsin and Mathematics 
Research Center, University of Wisconsin* 


but that there turtle is an insect.” 


6. ResoLution V DEsIGNS 


In Part I of this paper the construction of two version factorials of resolution 
III and IV was discussed. In some situations we need experimental designs in 
which all main effects and two factor interactions are unconfounded with all 
other main effects and two factor interactions. Such designs are called designs 
of resolution V since each word in the defining relation has five or more 
characters. The one-half replicate of the 2° factorial with defining relation 
I = 12345 mentioned in Part I of this paper is a resolution V design, that is, 
a 2°-* fractional. 

Since there are k(k + 1)/2 main effects and two factor interactions to be 
estimated, these designs require considerably larger numbers of experimental 
runs than designs of resolution III or IV. No simple fractional of resolution V 
exist for two, three or four variables. In fact, the largest number of factors 
which can be included in a fractional of resolution V is as follows: 


Resolution V Designs 
Number of Runs Number of Factors 


16 5 
32 6 
64 8 
128 11 


Construction of Resolution V Designs 


Table 23 lists the resolution V designs for k = 5, 6, --- , 11. The use of this 
table for constructing the appropriate design and arranging it in blocks may be 
illustrated in the case of the one-sixteenth replicate of the 2"* design, i.e., the 
2\'~* design. In this design there are 2""~* = 2’ = 128 runs so that we first write 
down in standard order the seven columns of the full 2’ design. The first column 


* Research Sponsored by the United States Army under Contract No. DA-11-022-ORD- 
2059. 


+ Now with the Chemical Engineering Department, Princeton University. 
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corresponding to variable 1 consists of alternating minus and plus signs, the 
second column alternating pairs of minus and plus signs and so on. We now add 
four further columns using the relations found in column five of Table 23. The 
new vector associated with the variable 8 is generated by multiplying together 
the minus and plus signs of columns 1, 2, 3 and 7 to obtain the elements for the 
interaction column 1 2 3 7. These are then used to identify the minus and plus 
signs of variable 8, that is, 8 = 1 2 3 7. A second new column is written down 
in a similar way for variable 9 using 9 = 2 3 4 5. Similarly 10 = 1 3 4 6 and 
11 = 1234567. We now have a complete one-sixteenth replicate of the 2" 
design with generators 


12378, 23459, 134610, 123456711, 


The full defining relation of this design is therefore made up of the words: 


I 1256910 
12378 167911 
23459 2571011 
134610 35678910 
123456711 13581011 
145789 2368911 
2467810 34791011 
456811 124891011 


We observe that all of the words in this defining relation have five or more 
characters identifying the design as one of resolution V. The generators given 
above are for the principal fraction. The generators for all sixteen fractions are 
given by 
+12378, +23459, +134610, +1234567 11 

To block this design in eight blocks of 16 runs we write down three further 
columns for B, , B, and B; using the relations in column seven of Table 23. The 
two versions of B, are obtained by multiplying together the minus and plus signs 
of columns 1, 4 and 9, the versions of B, using columns 1 2 and 10, and the ver- 
sions of B, using columns 8 9 and 10. The eight blocks are then identified by those 
runs having the signs of B, , B, , and B; equal to (— ,—,—), (+,—,—), (—, +,-—-), 
(+, +, =), (—, == +), (+, = +), (=, +; +), and (+, +, +). 


Designs with 16 Runs: The 25>* Design 
With five variables, a 2'>' design is obtained using the defining relation 


I= +12345. 


The 16 version combinations for such a design are written down by first writing 
out the design matrix for the 2* complete factorial and then identifying the 
versions of the fifth variable with those of the interaction 1 2 3 4. Clearly all 
main effects are here confounded with four-factor interactions since the alias 
relationships of main effects are of the type 


+1=2345 
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while those for the two-factor interactions are of the type 
+12 =345. 


It follows that if we can ignore three and four-factor interactions we can estimate 
with these sixteen runs the average, the five main effects and the ten two-factor 
interactions. This design is remarkable in that every degree of freedom is used 
to estimate effects of interest. However none are available for confounding 
so that it is not possible for this design to be run in blocks without associating 
one or more main effects or two-factor interactions with the block variables. 


Designs with 32 Runs: The 25>’ Design* 


For six variables there are 21 effects to estimate (the six main effects and the 
15 two-factor interactions). It is clear that a quarter replicate of a 2° factorial 
involving 16 points would not be large enough to estimate all the quantities 
of interest. The half replicate has as its defining relation 


I= +123456. 


Using this design main effects are associated with five-factor interactions, for 
example, +1 = 2 3 45 6 and two-factor interactions are associated with four- 
factor interactions, for example, +12 = 3456. 


32 Runs, The 25>' Design in Two Blocks of 16 


In blocking this and other fractional factorial designs it is important to bear 
in mind that any factor associated with blocks will have its aliases also associated 
with blocks. For the 2’ design for instance, where the alias relation 

= 12345 6already exists, if we adopt for the block arrangement B, = 12345 
then we should also have B, = 6. Thus we would confound the main effect of 
variable 6 with blocks. In general, when choosing a suitable factor to be associated 
with the block variables we must be careful to ensure that not only the chosen 
factor but also its aliases correspond to at least three-factor interactions. The 
present design can be run in two blocks of 16 by using any three-factor inter- 
action to define the block. For example, for block variable B, we can set 


B, = 123 


whence, because of the alias relationship I = 1 2 3 4 5 6, we find that B, = 
123 = 456, and no two-factor interaction is confounded. 


Designs with 64 Runs: The 27>' Design 


With seven variables there are 28 effects to be determined (the seven main 
effects and twenty-one two-factor interactions) so that in principle it might 
be possible to use a quarter replicate of the full 2” design which would involve 
32 experimental runs. In practice however this is not possible for the type of 
fractional factorial designs discussed here. Other designs, such as the three 
quarter replicate designs discussed elsewhere in this issue, can however be em- 

* The 2! and 27-! designs are properly of resolution VI and VII respectively. The block 


arrangements however insure that first and second order effects will not be associated with 
blocks or with each other. 
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ployed. The best that can be done for this series of conventional fractional 
factorials is to use of half replicate and once again it is desirable to use for the 
defining contrast the highest possible order interaction, i.e., 


I= 1234567. 


This seven variable design can be run in eight blocks of eight experimental 
runs each using the following four-factor interactions to define the block variables: 
B, = 1357,B, = 1256and B,; = 1234. To write down the complete design, 
first write down all the minus and plus signs for the 2° factorial in standard 
order. Introduce the seventh variable using 7 = +123 45 6. The eight blocks 
can then be obtained by writing down the three additional vectors B, = 135 7, 
B, = 1256 and B; = 1 2 3 4 and allocating the experimental points to the 
first block, second block, --- , eighth block in accordance as the signs of B, , 
B, and Bs; are (=; wf ae} (+, a =}, (=; +, ~), (+, +, <=), (=, se +), 
(+, —, +), (—, +, +) and (+, +, +). The derivation of this design is given 
in the Appendix. 


64 Runs: The 25>? Design 


For eight variables there are 36 effects to be estimated, the eight main effects 
and twenty-eight two-factor interactions. These estimates can be obtained with 
a one-quarter replicate of the 2° design involving 64 experimental points, that 
is, a 25? design. The defining relation will include three words in addition to 
the identity I and if all main effects and two-factor interactions are not to be 
confounded then all the words in the defining contrast must be at least five- 


factor interactions. Bearing in mind that the three words in the defining contrast 
will be such that any two of them multiply to give the third, the best arrange- 
ment will be of the type indicated below, in which the interactions involved are 
12347,12568 and their product 3 45 67 8 giving: 


1234 7 
12 56 8 
345678 


From the above array it will be seen that we are trying to pick two interactions 
containing five factors or more which have “‘minimum overlap’, so that their 
product will also be an interaction of five factors or more. We therefore take 
for the generating relations 


I=+12347; I=+12568 


and write down the design by first setting out the full 2° design for the variables 
1, 2, 3, 4, 5 and 6. The versions of the variables 7 and 8 then follow the minus 
and plus elements of the interactions 1 23 4and 1256. 

It is possible to arrange this 2\~” design in four blocks of 16 runs each without 
confounding block variables with any main effect or two-factor interaction. 
To see how this can be done, we remember that although it would be possible 
to write down 63 columns associated with all the main effects and interactions 
of the basic 2° design only 36 of these will be associated with main effects and 
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two-factor interactions for the eight variables of interest. There remain therefore 
63 — 36 = 27 columns containing + and — signs which can be utilized to 
accommodate block effects. If the experiment is run in two blocks of 32 any 
one of these 27 columns can be used to accommodate the blocks. In practice it 
is not necessary to write down all the 63 columns. In fact we select the columns 
for blocking by inspection of the generators. Some care is needed in this selection. 
For example, the 1 2 3 4 interaction contains four symbols and so on first sight 
appears to provide as a suitable column to be used in blocking. However, it 
cannot be used since 1 2 3 4 is an alias of 7 and therefore the main effect of 7 
would be confounded with blocks. This is so since one of the generators for the 
quarter replicate design is 1 2 3 47 and consequently 1234 = 7. 

Clearly it is necessary to check not only that the interaction employed in- 
volves three factors or more but also that all its aliases involve three factors or 
more. The simplest way to pick out a suitable interaction is to write out the 
defining relation for the fractional factorial 


I= 12347=12568=345678 


and then by inspection select interactions whose multiples with the words of 
the defining relation contain three or more characters. For instance, suppose 
1 3 5 is selected as a possible interaction to confound. Then 


135=2457=2368=14678. 


Thus the three-factor interaction 1 3 5 is confounded with no interaction con- 
taining fewer than three factors, and hence, 1 3 5 is a suitable contrast to use 
for blocking. 

If we wish to run the experiment in four blocks of 16 then another interaction 
of three factors or more having no aliases with less than three factors must 
be chosen with the additional requirement that its interaction with the first 
interaction used to identify blocks has aliases all of three factors or more. For 
example, take 13 5 and 3 4 8. Using these, the three degrees of freedom among 
the four blocks will be associated with 1 3 5 and 3 4 8 and with their interaction 
145 8. That these give satisfactory aliases is seen by obtaining their multiples 
with the defining relation of the fractional factorial giving: 


135 2457 = 2368=14678 
348 1278=123456= 567 
1458=23578= 246= 1367 


Thus as indicated in column six of Table 23, the + and — signs associated with 
the interaction vectors 1 3 5 and 3 4 8 can be used to identify four blocks of 
sixteen runs each. 


Designs with 128 Runs: The 25>, 2\’-* and 2)'~* Designs 


The designs for nine and ten variables using 128 runs can be regarded as a 
special case of the 2\/~* design from which one or two variables have been 
dropped. In many cases therefore where nine or ten variables are to be studied 
and where there are one or two further variables of possible importance, it 
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would be worthwhile to include these and in fact to run the full eleven variable 
design. We shall first discuss the 2}'~* design and then discuss the designs for 
nine and ten variables as special cases. To construct the 2/'~* design we begin 
by writing down the — and + signs comprising the seven columns for the 
complete 2” factorial. The sixteenth fraction of the 2"' design is then obtained 
by associating certain interactions between the seven variables with the new 
variables 8, 9, 10 and 11. Let us call these interactions W, X, Y and Z, so that 
the column headings for our final design matrix will be as follows: 


WX YZ 
12345678 91011 


It will be understood that each of the symbols W, X, Y, Z, corresponds to an 
interaction of the first set of seven variables. For example, if W is the interaction 
1 2 3 4, the generators produced by associating W with the variable 8 would be 
12348 or W8. In general the generating relation for the complete design are 


I=W8, I=X9, I=Y10, I=Z11 


and hence the defining relation isI = W8 = X9 = Y10= Z11=WX89 = 
WY810=WZ811l = XY910 = XZ911=YZ1011 => WXY8910 = 
WXZ8911=WYZ81011 = XYZ91011 = WX YZ891011. Now 
if this design is to be of resolution V all the words in the defining relation must 
contain at least five symbols; it follows that: 


(i) W, X, Y and Z must themselves each contain at least four symbols. 
(ii) The products W X, W Y, W Z etc., must contain at least three symbols 


since all words in the defining relation containing these pairs will also 
contain two further symbols from the group 8, 9, 10, 11. 

(iii) Similarly the products in threes W X Y, W X Z, etc., must contain two 
symbols and 

(iv) the product W X Y Z must contain at least one symbol. 


The problem is therefore to find four interactions each containing at least 
four symbols with all the above properties. A set of interactions having these 
properties has in fact already been derived. It was shown earlier that the de- 


fining relation for the 27;; design is: 


1=124=135=236=1237=2345=1346=347=1256 
=257=167=456=1457=2467=3567 =1234567. 


A set of generators for this defining relation are 1237, 2345, 1346 and 
1234567. Suppose then we choose: 


W=1237 
X¥=2345 
Y=1346 
Z=1234567 
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Then clearly all conditions (i), (ii), (iii) and (iv) mentioned above will be satisfied. 
A one-sixteenth replicate of the 2’ for which no main effect or two-factor inter- 
action is confounded with any other main effect or interaction, that is, the 
2\;~* design can be obtained by setting 


+ 8=1237 
+9=2345 
+10 = 1346 
#11 = 1234567 


to give the generators 
+12378; 4223459, +134610,41234567 11 


Given that three-factor and higher order interaction effects are negligible the 
resulting 2\/~* design provides separate estimates of the eleven main effects and 
fifty-five two-factor interactions. 

The 2'?-* and 2$-? designs may be obtained from the 2/'~* design by dropping 
out respectively one or two of the variables. The defining relations for these 
designs omit all words containing the dropped variables. All designs obtained by 
dropping variables will have the properties of the parent design. However, 
selections can be made which improve the reduced design. For example, if we 
drop variable 11 the generators for the resultant 2}?~* design are 


+12378; +23459; +134610 


The corresponding defining relation contains three words with five characters, 
three with six and one word with seven characters. However, if variable 10 
is dropped the generators are 


+12378; +23459; +123456711 


and the defining relation now contains four words with five characters tiwo with 
six and one word with eight. The first arrangement is slightly preferable since, 
with fewer five character words, fewer two-factor interactions will beconfounded 
with three-factor interactions. 

Similarly for the 2"’ design it is best to drop variable 3 in addition to variable 
11. The generators for this design are 


+145789; +2467810 


from which we see that all main effects will be confounded with only five-factor 
interactions and two-factor interactions only with four-factor interactions. 
(This design is, in fact, of resolution VI). 


128 Runs: The 2}'~* in 8 Blocks of 16 Runs 


It is a somewhat remarkable fact that the 2//~* design can be blocked into 
eight groups of sixteen runs. Using the generators for the principal fraction of 
the 2\'~* given in the previous section, we obtain the defining relation: 
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I1=12378=23459=134610 = 123456711 =145789 
= 2467810 = 456811 = 1256910 = 167911 =2571011 
= 35678910 = 2368911 = 13581011 
= 34791011 = 124891011. 


To obtain eight blocks we require seven interactions produced by three block 
generators which, when multiplied together and by the words in the above 
defining relation produce words with three or more characters. It can be con- 
firmed by actual multiplication that a suitable group of block generators is 


B, = 149, B, = 1210, B, = 8910. 


We find that the following seven interactions will be confounded with block 
effects: 


B, = 149 B,B, = 24910 
B, = 1210 B.B, = 14810 
B;, = 8910 B.B, = 1289 

B,B.B, = 248 


When variables 3 and 11 are dropped from the eleven variable design these 
same block generators may also be used to generate the blocks for the ten and 
nine factor resolution V designs. 


APPENDIX 


The Derivation of the 27>" in Blocks of Eight 


The 27° contains sixty-four experimental runs and is constructed by first 
writing down a full 2° factorial and then setting 7 = 123456. To divide the 
design into eight blocks of eight runs such that no main effect or two-factor 
interaction is confounded with any block effect we must be able to find seven in- 
teractions to associate with the seven degrees of freedom between the eight blocks 
and each of these interactions and their aliases must be at least three-factor 
interactions. To see how this can be done we refer once again to the table of minus 
and plus signs appropriate to the 2° factorial. This table, shown below, has been 
used to produce a second table in which a number appears if there is 
a minus sign in the first table, and does not appear if there is a plus sign. 
If we associate the first three columns of the first table with B, , B, and B, 
we see, that the interactions B,B, , B,B; , B,Bs and B,B,B, will identify the 
remaining columns. We also note that all of the columns contain four minus 
and four plus signs. Referring now to the second table, when two columns are 
multiplied together only those numerals which appear once, or an odd number 
of times, will appear in the product, and those numerals appearing an even 
number of times will not appear in the product. This is exactly the same rule 
adopted earlier to assist us in identifying the alias terms. It is clear therefore 
that setting B, equal to the 1357 interaction effect, B, = 1256 and B, = 
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TABLE 


B.B; 


+ sign in 
23 factorial 


confounded inter- 
actions for the 
4‘ replicate of 
the 27 design 


1234 that all seven of the block effects will be confounded with four-factor 
interactions. Since the 2)" has the defining relation I = 123 4567, each of the 
block defining contrasts will therefore be aliased with those three of the seven 
numerals which do not appear in it. Thus, the 277" is blocked such that all block 
effects are aliased with at least a three factor interaction and hence clear of both 
main effects and two factor interactions as required. Of course, before blocking, 


the design with defining relation I = 1234567 is of resolution VII. 
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Reduced Designs of Resolution Five 


Joun C. WurrwEtt* anp GraHaM K. Morsey** 
Princeton University, and the Textile Research Institute, Princeton, New Jersey 


The two level fractional factorial designs of resolution five enable the experimenter 
to estimate independently all main effects and two-factor interactions under the 
assumptions that higher order interaction effects are negligible. By relaxing, very 
slightly, the requirement that all two-factor interactions be estimable, or that all 
estimated effects be orthogonal, the number of runs required for many resolution 
five designs can be greatly reduced. 


INTRODUCTION 


Consider the situation where a multivariable, two-level, factorial design 
appears to be desirable. The design can arise from either of two causes. In the 
first, little may be known of the importance of the individual variables under 
the control of the experimenter and the experiment will be concerned with a 
“screening” of these variables. The experimental program thus might start 
with some highly fractionated factorial, and the progress of the experimentation 
would be in the form of a series of blocks, each orthogonal to the former blocks. 
The particular succession of blocks chosen would attempt to obtain the maxi- 
mum amount of information consistent with conclusions drawn from results of 
the previous blocks. 

In the second case, all of the variables appear certain to have effects. This 
condition is due either to previous screenings of a larger number of variables, 
yielding those under present consideration as the most important, or to ex- 
perience with the process or system, leaving little doubt that all variables or 
their combinations in the form of first order interactions (two-factor interactions) 
exist in important magnitude. 

Given that the desirability of the inclusion of a variable has been established 
the questions to be resolved by the experimental program include the evaluation 
of the effect of each individual variable along with their two-factor interactions 
(plus other matters such as variance of estimates). The desirability of the inclu- 
sion of the variable in the model has been established. Assuming that the proper- 
ties of similarity and smoothness usually encountered in practice tend to mini- 
mize interactions between three or more variables (1), and that no non-linear 
relationships between responses and independent variables can be written 
a priort, the situation is ideal for the two-level factorial. However if the number 
of independent variables is high, the minimum program which will completely 
resolve all main effects and two-factor interactions independent of each other 
(the irreducible Resolution-5 design, designated type B’ by Box and Wilson (2)), 
may well be large enough to be unwieldy. Using the symbol k as the number 
of independent variables, with p an integer such that 1/2” represents the frac- 


* Professor of Chemical Engineering, Princeton University. 
** Current address: Dunlop Research Center, Toronto, Ontario. 
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tion of the entire 2° experimental program, the minimum Resolution—-5 program 
can be readily found for k as large as 11 (1): 


k Size of Program = 2*-” 
7 64 

8 64 

9 128 
10 128 
11 128 


This paper will discuss further reductions in the size of the programs below 
those listed above, the methods for effecting the selection of the experiments 
to be performed from the complete list required in resolution-5, and the impli- 
cations inherent in the reduction.* It is, of course, obvious that the intended 
reduction will result in some loss in resolution, in orthogonality, and/or in 
precision. 


GENERAL CHARACTERISTICS OF RESOLUTION-5 DESIGNS 


Designs of this type require the estimation of k(k + 1)/2 effects in addition 
to the mean. Thus the program, consisting of 2'~? experiments, must contain 
a minimum of 1 + k(k + 1)/2 experiments. These designs can, in most cases, 
be broken into orthogonal blocks in which no main effect or two-factor inter- 
action is confounded with blocks. These blocks, which then cortain 2‘~-”~” 
treatment combinations, can be treated as individual and independent sections 
of the whole 2‘~” design.** 

Except in the case of the 2°-* design, where every degree of freedom is used 
to estimate a main effect or two-factor interaction, the minimum Resolution-5 
designs are larger than necessary; i.e., the number of experiments is considerably 
greater than the number of parameters to be estimated as shown in Table 1. 
Extremely ineffective in this respect are the designs for 7, 9, and 10 variables, 
for, in these cases, additional variables can be introduced without increasing 
the minimum number of experiments for a complete Resolution-5 type. For 
example, an 11 factor design requires only 128 experiments, no larger number 
than is required for either 9 or 10 variables. 


Sreciric Reso.ution-5 DrEsiGns 


The fractionating generators and all the resulting defining relations shall 
contain at least five factors to avoid confounding two-factor interactions with 
each other. The blocking generators and all the resulting defining relations for 
blocks must contain at least three factors in order to avoid losing main effects 
or two-factor interactions in the confounding with blocks. 


* See also Reference 4. 

** Confounding into fractions, or into blocks, is fixed by sets of ‘defining contrasts.” 
Each element in the set of defining contrasts is one of the 2* factorial effects. Of the 2P elements 
in the set of defining contrasts for fractionating, one is always the identity, p others may be 
freely chosen so long as they are independent. The remaining 2P — p — 1 are determined by 
all possible product combinations of the p independent contrasts chosen. It is for this reason 
that the p elements which are independently chosen are termed generators. 

The same remarks apply to the m blocking generators, 
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In Table I generators for Resolution-5 designs are shown (1). 


TaBLe [ 


Number Generators Design 
Minimum Number Param- ———————————— 


Vari- Size, Reso- of Experi- eters to Number Block 
ables _lution-5 ments Estimate Fraction Block of Blocks Size 


ie 16 16 12345 wae 16 
2" 32 22 123456 16 
64 29 1234567 


12347 
12568 
14568 9* 
1357810 
12348** 
23569 
2457 10 
12348 
23569 
2457 10 
123456711 


* The nine-factor design is derived from the 2‘ by dropping variables 2 and 11. 
** The ten-factor design is derived from the 2"'~‘ by dropping variable 11. 


REpDvucED DEsIGNs OF RESOLUTION-5 


It will be shown that, when a fractional factorial design can be broken into 
more than two orthogonal blocks, it is often possible to estimate all the main 
effects and nearly all the two-factor interactions without completion of the 
total 2*-? design. All that is required is a judicious selection of certain blocks 
from within the fractionated design and a suitable interpretation of the results. 

There is no attempt to prove that the designs suggested are the best of this 
type. However, the subsequent remarks on choice of blocks will indicate that 
the choice is not random and that certain advantages may be expected to accrue 
from the specific type of choice suggested. 

Consider the first problem, selection. Arbitrarily the principal block, which 
estimates all the main effects and their associated confounded two-factor inter- 
actions added together, will be used as a basis in the reduced design. The method 
of obtaining these designs is illustrated first with the 2~*. 
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THE ELEVEN-Factor DESIGN 

A satisfactory design has already been noted to have the following generators: 

for fractionating: 
IT=12348 =23569 = 245710 = 123456711 

and for blocks: 
The eight combinations of signs (+, +, +), associated with the blocking gener- 
ators, identify the eight blocks. The signs of the generators in the principal 
block, as previously designated, are all plus. The size of each block is 2’*~*~* = 2°, 
and therefore contains 16 experiments. The design matrix for the principal 
block of the 2**~*~* is shown in Table II. 

To obtain block (+, —, +) from the principal block the sign of 3, 4, or 10 
must be changed, or all three must be changed. Arbitrarily select the sign of 3. 
Then in order to maintain the proper signs on the other blocking generators 


and the fractionating generators, the signs of factors 8, 9, and 11* must also be 
changed. The other blocks are obtained as follows: 


B Change Signs of Factors 


Bs 
- + 9, 10, 11 
“f. 


8, 11 
9, 


8 
_ 10 
8 


, 


; 9, 10, 11 
The effects which may be calculated from the 2"~*~* principal block are 
essentially sums (i.e., specific linear combinations) of the confounded main 


effects and two-factor interactions. In this instance, these confounded effects 
(or contrasts) are as follows: 


1+710+ 29+ 68 
2+ 19+610+ 78 
3+ 58+410+ 911 
4+ 59+310+ 811 
5+ 49+ 38+1011 
6+ 79+210+ 18 
7+110+ 69+ 28 
8+ 274+ 16+ 35+411+910 
9+ 12+ 67+ 4543114810 
10+ 17+ 26+ 34+511+ 89 
ll+ 39+ 48+ 510 
13+ 56+ 474+ 211 
14+ 25+ 37+ 611 
15+ 24+ 36+ 711 
46+: 233+ 57+ Ii 


* A procedure for establishing sign changes and for treating the X’X sub-matrices which 
result is summarized in the appendix. 


’ 
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B; 


t++++t++t++ts+++se+ 


Bz 
3410 8910 


F++++t+++te+t+++se+ 


ee 


Bi 
358 


24 


littl ib iteti itt 


23 


Piretstt tt i +t 


14 


ti +t+tlititi+tict+ 
P+it+i tr rtit 
+L iti teiti it 
++4+411 bl +t+++ 


tl itt rte r reer rt 


FIL PHL tei i teei tit 


Pei tti ti titi iti¢ 
ee 
[++i rt+tit+ti titties 
bil+++4+4+4+44 

br ++++i tr ti tees 
b++irt+eirttei rset 


i i ee ee 


BNO OO OS 


* This notation indicates that 124 = 5,23 4 = 6, ete. 
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In an 11-factor, two level, design, the number of parameters to be estimated 
is 66 plus the mean; however the number of treatment combinations required 
for a complete 2''~* is 128, almost double the required 67. Further, since the 
maximum number of effects confounded together is six, all main effects and two- 
factor interactions can potentially be estimated in six blocks of sixteen treat- 
ment combinations each, a total of 96 experiments instead of the full 128. This 
potential type of reduction will now be examined, stepwise. 

Selection of any four blocks, as for example, those given by the block signs 
(+, +; +), (+, ae +), =~, ss +), (~, +; +) gives a characteristic X’X 
matrix of a form which is readily partitioned into null sub-matrices plus a series 
of symmetric sub-matrices for each set of confounded interactions. These charac- 
teristic sub-matrices for all sets of confounded interactions except those associated 
with factors 8, 9 and 10 are given by: 


"he 


| 

oe 7 

ear | 
; “1: 

oe aad 
and this expression serves to define the symbol I, . 


The characteristic X’X matrices for all effects associated with factors 8, 9, 10 
are given by 


8 910 27 16 35 411 
column and/or row 
identification for 9 810 12 67 45 311 
three X’X matrices 10 89 17 26 34 511 


Since there are identical pairs of rows (and columns) in this sub-matrix, it is 
singular; i.e., its determinant is zero. This singularity is to be expected since 
there are but four equations (one per block) with which to solve for six unknowns 
(the confounded effects). For complete resolution of the design in 64 experiments, 
it is essential that a priort knowledge shall allow three of the remaining six 
effects to be assumed negligible in each of these three matrices. Further, these 
three effects must be properly chosen, one from each pair with identical rows 
in the X’X matrix. The alternative to this assumption is retention of the con- 
founded pairs, unresolved. This course is ordinarily undesirable in that one 
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pair from each matrix contains a main effect confounded with a two-factor 
interaction. 

Adding two additional blocks to the four blocks used previously enables 
partial resolution of the interactions of the design associated with factors 8, 9 
and 10. It is immaterial which two are added since all the unused blocks contain 
the minus sign on B; . The blocks (+, +, —) and (—, +, —) were used as 
basis for the following remarks. If these extra two blocks are added to the entire 
design, the orthogonal parts found in the four-block design become non-or- 
thogonal. Therefore, the use of these additional blocks may be restricted, if 
desired, to factors 8, 9 and 10 only. The orthogonality of the four-block design 
is then retained, on all contrasts except those initially confounded with factors 
8, 9 and 10, although some potential gain in precision is thereby disregarded. 

The characteristic X’X sub-matrix for the factors 8, 9 and 10 in the six-block 
design now becomes: 


3 1 
1 3 1 
1 -1 3 1 
—] 1 1 3 
column and/or row 8 910 27 16 35 411 


identification for 46 $331 67 Y 810 12 
. ; 
three X’X matrices 17 26 10 34 89 511 


The 2 X 2 sub-matrix, which may, of course, be treated separately, has an 
inverse 


ee ee 3 -1 
a mes 


However, the 4 X 4 sub-matrix is singular. It may be resolved if any one of its 
four contrasts can be assumed to be zero. For example, by dropping the third 
row (and column) of this sub-matrix, X’X becomes: 


31-1 
Xx’X = 32} 1 3 1 
-1 1 3 
This matrix is non-singular, with inverse 
2 
—1 
1 


a — 
XX" = G26) 
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Consequently, with three specific two-factor interactions assumed, or known 
to be, negligible, the eleven factors and their two-factor interactions can be 
uniquely determined in six blocks, or ninety-six treatment combinations (i.c., 
96 experiments). The estimates must be non-orthogonal for factors 8, 9 and 10 
and thier associated interaction effects. 


Tren-Factor DESIGN 


Referring to Table I, showing fractionating and blocking generators, the 
linear combination of effects in the 2°°-*~* principal block is seen to be the same 
as with eleven factors except that factor eleven has been dropped from the design. 

As with eleven factors, the selection of blocks (+, +, +), (+, —, +), 
(—, —, +), (-—, +, +) enables the separation of all the main effects and two- 
factor interactions except those associated with factors 8, 9 and 10. The charac- 
teristic X’X matrices for effects associated with all factors except 8, 9, and 10 
are unit matrices, multiplied by the scalar, 64. The matrices for factors 8, 9, 
and 10 are readily obtained from previous remarks on the eleven-factor design: 


8 910 27 16 35 
column and/or row 
identification for 9 810 12 67 45 
; - 
three X’X matrices 10 89 17 26 34 


The matrix is singular; it can be made non-singular by elimination of one 
each of the pairs of equivalent rows and columns. Therefore, the assumption 
that six selected effects are zero will be required in order to solve uniquely a 
ten-factor design in sixty-four treatment combinations. 

If one further block, in this case (+, —, —), is added to the design, partial 
resolution of the interactions confounded with factors 8, 9, and 10 can be effected. 
The characteristic X’/X matrices involving factors 8, 9, and 10 are then given by: 


1 
—1 
—1 

1 

5 
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S O10 27 £6 35 
column or row 


identification for o Sie 67 22 45 
three X’X matrices 2 617 10 89 34 


The matrix is still singular, but becomes non-singular when any one of the 
first four rows (and columns) is omitted. For example, with the fourth row 
(and column) dropped, 


This matrix is non-singular, with inverse 
7 
eK? =—7_|-9 


~ (16)(16) 
2 -2 


—2 2 


Therefore, three assumptions of effects which are negligible, or zero, will be 
required in order to solve the ten-factor design in eighty treatment combina- 
tions. However, the choice of assumptions is considerable both from within 
the matrix and from within the designation of the variables according to the 
numbering system used. 

Any attempt to reduce further the confoundings remaining after 5 blocks 
of 16 each is a matter of diminishing returns. The addition of a sixth block will 
resolve only two of the remaining three and would not appear to be an economical 
procedure under ordinary circumstances. 


Tue NINE Factor DESIGN 


The 2}? design is obtained from the 2/~* design by dropping factors 2 and 
11; the factor 2 is omitted in favor of factor 10 for convenience in comparing 
with the ten-and eleven-factor designs. Therefore, the defining relations used 
as generators are: 

I = 145689 = 1357810, the products of generators 1 and 2 and 
generators 1 and 3, in the 2\~* design; the blocking contrasts are unchanged; 


B,= +358 B,=+3410 B, = +8910 


Since the 2°~* design requires as many experiments as the 2"'™, it is un- 
economical in dealing with nine variables unless the resolution of effects in less 
than eight blocks is more effective than in the 2'*~* design. 
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The linear combinations of effects in the 2°-?~* principal block are: 
1+710+ 68 
3+ 58+ 410 
4+ 59+310 
5+ 49+ 38 
6+ 79+ 18 
7+110+ 69 
8+ 16+ 35+910 
9+ 67+ 45+810 
10+ 17+ 34+ 89 
19+610+ 78 
39+ 48+ 510 
13+ 56+ 47 
4+ 37 
15+ 36 
46+ 57 


Selection of blocks (+, +, +), (+, —, +), (-, —, +), (-, +, +) gives 
a characteristic diagonal X’X sub-matrix for factors 1, 3, 4, 5, 6, 7, 19, 39, 
13, 14, 15, 46, and the effects associated with them in the principal block, 
as written above. However, for factors 8, 9, 10 


oe 
= 


1 


8 910 16 35 
column and/or row 
identification for 9 810 67 45 
three X’X matrices 10 89 17 34 


Therefore, the two-factor interactions 8 9, 8 10, and 9 10 must be assumed 
to be zero in order to resolve completely the nine-factor design in four blocks 
(64 treatment combinations). If the design were completely soluble without 
this restriction, the existence of a 2° design would be implied, and it is known 
that this does not exist. 

Blocks (+, +, +); (+, dD +); (=, se +), (=, .< +) were chosen to 
estimate orthogonally as many contrasts as possible. However, in the nine- 
factor design there is a considerable degree of redundancy using these selected 
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blocks, (i.e., sets of two and three main effects and interactions confounded 
together in the principal block which conceivably might be resolved in two or 
three blocks respectively). This redundancy suggests a change in the choice of 
blocks. Contrasts confounded with factors 8, 9, and 10 might then be esitmated 
at the expense of orthogonality between factors which have the redundancy. 

Using the four blocks (+, +, +), (+, —, —), (—, +, —), (+, +, —) givesa 
characteristic X’X sub-matrix for factors 1, 3, 4, 5, 6 and 7, with associated 
interactions in the form: 


ae 
X’X = 32}1 2 0 
2s. 
The matrix is non-singular, with inverse 
| 4 -2 
1 


(0X) = B2I@|~? 


—2 1 3 


The two-factor interactions associated with 19, 39, 13, 14, 15, and 46 are 
estimated orthogonally. 


Factors 9, 6 7, 4 5, 8 10; 10, 1 7, 3 4, 8 9; 8, 1 6, 3 5, 9 10 give: 


e —1) 
2 0 


1 0 
—-1 0 2 
The matrix is non-singular with determinant = 4 and inverse 
4-2 -2 2 | 


. —— oo oe 
(X’x)? = —— 
eed ae oe _ 


2-1 -1l 2. 


Thus, with the introduction of non-orthogonality and coincident loss of 
precision with six of the main effects, the nine factor design can be resolved 
completely in four blocks (sixty-four treatment combinations). With the same 
amount of experimentation, and with three interactions assumed to be zero 
a priori, the whole design can be resolved in four blocks instead of the full eight, 
with complete orthogonality for all effects estimated. If the three assumptions 
are dangerous, all contrasts may be estimated simultaneously without assump- 
tions, but payment for this privilege must be made in the form of non-or- 
thogonality. It is apparent that this design, with sixty-four treatment combina- 
tions, is more effective per experiment than the pure Resolution-5 design, with 
its one hundred twenty-eight experiments. 
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Block Choice. It is now evident how blocks may be chosen. Using the case 
of the nine factor design as illustration, the two sets of blocks for the sixty-four 
treatment combinations were: 


B, =358 B,=3410 B, = 
4. 


910 


+ 
Orthogonal, but not + 
completely resolved — 


8 
te 
af. 
4. 
he 
+ 


Non-orthogonal, but 
completely resolved _ 


In the first table B, and B, contain an equal number of + and — signs, while 
B; is all of one sign. Therefore, the blocking factors 8, 9 and 10 are not resolved, 
and trouble in the design is concentrated in this group. In the ten-factor and 
eleven-factor designs it will be noted that this concentration is logical, since 
these factors cannot be completely resolved anyway, there being only four 
equations to solve for five or six unknowns respectively. In the second table 
of blocking signs, the + and — signs are not balanced in any of the B; ,and 
therefore complete orthogonality cannot be expected with any of the factors 
contained in the blocking contrasts. Sub-matrices of similar structure to those 
factors 3, 4, 5, 8, 9 and 10 will have similar characteristics. Thus factors 1, 6 
and 7 are also non-orthogonal. 

Tue Ereut Factor DrEsiGgn 

A 25? design is obtained using as generators the defining relations: 
= 12347=12568 

with blocking contrasts: 

B,=135 B,=348 


The principal block (B, , B, with signs +, +) estimates the following linear 
combination of effects: 


l+o05 12+78 

2+ 46 14+ 58 
3+15+48 16+37 
4+38+26 17+28+36 
5+13+67 18+27+45 
6+57+24 23+68 

7+ 56 25+47 
8+34 
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Selection of the principal block plus any other block enables the design to 
be resolved only when nine two-factor interactions are assumed to be zero. 
Selection of three blocks (+, +); (+, —); (—, +) gives the following character- 
istic X’X matrix for the effects associated with factors 1, 7, 8, 12, 23, 25: 


Sear" *|:  @a wake! * 
1 3 wink a 
and for the remaining pairs, those associated with factors 2, 14, 16: 
rx-14 * “|; ap =-—_|* '1; 
ek a (16)(8) |, 3 
In both these cases, use of two of the three blocks provides: 


1 
32 


For factors 3, 1 5, 48; 5, 13, 6 7;17, 28,3 6;18, 27,45: 
-) 4 
XX=161 3 -1 
t+} 8 


XX=32L; (X)'=—1, 


The matrix is non-singular, with inverse 
| 2-1 -1 
—|] 
For factors 4, 3 8, 2 6; 6, 5 7, 2 4: 
[ 3 


XX=16 1 38 -1| 
a iw’ 


The matrix is also non-singular, with inverse 


4-1 11 
4 th: = 


(x)! = ; 
1 1 + 


cinta a 
~ (16)(10) 32 

Thus, with eighteen of the thirty-six estimates non-orthogonal, the eight 
factor design can be completely resolved in three blocks or forty-eight treat- 
ment combinations. The estimates are non-orthogonal whenever three blocks 


are used, and the precision of these estimates is slightly higher than for the 
two-block orthogonal estimates. 
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Tue Seven Factor Design* 


A seven factor design of form 2}-* can be obtained using as fractionating 
generator the defining relation: 


I=1234567 


This design can be arranged in eight blocks of eight treatment combinations 
each. The blocking contrasts are: 


The 2*-*~* principal block estimates the following linear combination of 
effects: 


1+25+36+47 
2+15+37+46 
3$+164+27+45 
4+17+264+35 
§5+12+34+67 
6+13+24+57 
7+144+23+4+56 


Selection of blocks (+, +, +), (+, —, —), (—, +, —), (+, —, +) gives 
the characteristic unit X’X matrix for factors 1, 2 5, 3 6, 4 7. 
For factors 2, 1 5, 3 7, 4 6; 3, 1 6, 2 7, 4.5; 7, 14, 23, 56: 


2 ¢ =] 1 
2 —] 

—1 1 
1 —1 . 2 


This singular matrix becomes non-singular by elimination of any row or 
column. 


For factors 4, 1 7, 2 6, 3 5; 5, 1 2, 3 4, 6 7; 6, 13,24, 57: 


This singular matrix also becomes non-singular by elimination of any row 
or column. 


Therefore, the seven factor design (estimating 29 effects) can be resolved 


* See also Reference 4, 
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in four blocks (32 experiments), only with six assumptions of interactions 
equal to zero and with all estimates non-orthogonal. 

Selection of an additional block (—, —, —) enables the design to be determined 
with three assumptions of interactions equal to zero. The seven factor design 
is then resolved in forty treatment combinations if three two-factor interactions 
are assumed to be zero. All the estimates are non-orthogonal. 

Addition of a further (sixth) block allows the design to be resolved if one 
interaction is assumed to be zero. A seventh block enables the whole design to 
be resolved in fifty-six treatment combinations, but, as in the previous cases 


with this design, the estimates obtained are non-orthogonal, with strong bias 
on each. 


VARIANCE ESTIMATES 


For both orthogonal and non-orthogonal cases the methods described will 
yield values of the effects (contrasts) identical to the b; or the b;; of the re- 
gression model 


k 
Y= > 0.X;:+ > bX:X; 


t=0 ij 


The variances of the effects can, therefore, be calculated in terms of the diagonal 
elements of the inverse matrix, c*’, as 


a, = of 

In this relationship, i is the subscript on the coefficient in question; s’ is the 
mean square residual measuring lack of fit, an estimate of the failure of the 
data to fit the model; thus, it may, or may not, represent the error estimated by 
replicate measurements. 

While the various s;, may be obtained by the calculation of c‘*‘, many will 
be available directly from a consideration of the properties of the design; others 
may be synthesized by consideration of the linear combinations producing the b; . 

In the orthogonal cases, X’X is always a unit sub-matrix multiplied by a 
scalar. The latter is equal to the number of experiments performed in this portion 
of the design. Therefore, c** is the reciprocal of the number of experiments. For 


example, using the specified 4 blocks in the 2**~* design, c** = 1/64 for all effects 
except those confounded with variables 8, 9, 10. 


X’X = 64 I,, and therefore, c’* = 1/64. 


When the matrix is not a unit matrix, but is non-singular, the c** may be 
obtained from the inverse. For example, in the 9-factor design, using the four 
blocks with unbalanced signs in order to obtain a solution for all factors, the 
(X’X)~* for factors 9, 6 7, 4.5, and 8 10 has been shown to have as diagonal 
inverse elements 1/16, 1/32, 1/32, and 1/32 respectively. These are the necessary 
c’* for this case. 

Finally, cases arise where the eventual resolution of the coefficients has been 
recommended to be effected by adding specific blocks to the original orthogonal 
group. As an example, in the 10-factor design, the factors 8 + 910 were not 
separated in four blocks of 16 experiments each. Separation is possible by adding 
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an additional group of 16 experixents if the contrast 16 can be assumed to be 
negligible. The X’X-matrix for this case may be written, and the c‘‘ for 
factors 8 and 910 determined. However, the variances for these two factors 
may be obtained directly by considering the linear combinations by which 8 
and 9 10 are separated. In the added block, the confounding is 


8 -—910+35 — 27. 


To the contrast representing this combination add contrasts representing 
(8 + 910), 27, and — 3 5, as previously determined from the four blocks of 
16. Since the contrasts and the b’s are identical, 


2bs = bs+910 — bss + bez + bs-9 1043 5-27- 


All of these coefficients and their variances are estimated after performing the 
fifth block. 
Therefore, 


er |: ae ee i) 
i-e(S+h+dsd 
8, = 1.325 8/8. 


The coefficient for 9 10 is obtained in the same manner and its variance is identi- 
cal to that for factor 8. The efficiency of the estimate for factors 8 and 9 10 is ob- 
viously 1/(1.325”) = 0.57, compared to that of factors resolved orthogonally 
in the original four blocks. 

If the X’X-matrix for factor 8, 9 10, 2 7, 3 5, obtained from all 80 experiments 
had been inverted, the same c** will be obtained for the factors 8 and 9 10. 
However, if this same matrix is used for factors 2 7 and 3 5, the c** will remain 
1/64 as in the original four blocks. With this fact in mind, the authors have 
chosen not to include the fifth block with any factors orthogonally resolved in 
four blocks. In Dykstra’s designs (3) for partial duplication, there are potential 
gains in precision which may be realized at the expense of non-orthogonality. 
Since in the current paper, similar gains are not always apparent, orthogonality 
has been arbitrarily chosen throughout where it is available. 


SUMMARY 


The number of treatment combinations (experiments) necessary to estimate 
the effects in multivariable Resolution-5 designs can be considerably reduced 
when the number of variables exceeds 6. The price of reduction is either the 
loss of a few interaction terms or the introduction of some non-orthogonality. 
The latter is not entirely avoidable in any of the suggested designs, summarized 
in the following table. 


APPLICATION 


One of these designs, specifically the 2°°-*, has been used to investigate the 
influence of various variables in the spinning of mixed polymer blends. After 
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Resolution-5 Design Reduced Resolution-5 Designs 


Confounded Pairs 
Blocks _ Block Size Blocks ; Remaining 


8 16 1 


16 


aoaurk oN Fe OTE Oe 


* All estimates non-orthogonal. 


completion of the first four blocks, it was apparent that all of the necessary 
six assumptions of negligible interactions could not be tolerated. Therefore a 
fifth block was added, with entirely satisfactory results. The detailed results, 
and their interpretation, will, in the near future, be submitted for publication 
in another journal. 
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APPENDIX 


The 2~* design will be used as an example of the procedures referred to in 
the paper. 

Fractionating generators: 12348, 23569, 245710, 123456711 

Blocking generators: 3 5 8, 34 10, 89 10 
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In the principal block, all blocking generators have positive sign and the sub- 
matrix representing design conditions for each of the confounded factors 


8,27,16,35, 411,910 


will, therefore, be a vector, as shown in the original design matrix, under 
variable 8 


ee ea se i me me) 
The resulting 6X6 X’X sub-matrix for these confounded effects will be 
8 27 16 35 411 910 
: x 1 1 
1 


1 
1 
1 
1 
1 
1 


1 
1 
1 
1 
1 
1 


In another block, for example that in which the blocking contrasts are 
B, = +358; B, = —3410; B; = +89 10, it is necessary that the signs of 
one or all of the factors 3, 4, and 10 be changed to produce the sign change on 
B, . However, in making this change, no change in signs of B, or B; can be 
tolerated, nor can the signs of the fractionating generators change. Thus, if 
the sign of 3 were to be changed, and this one alone, B, would carry the negative 
sign and three of the four fractionating generators would also be changed. 
To prevent this situation, inspection will readily show that signs of 8, 9, and 
11 must also be changed when 3 is changed. 

This group of sign changes to produce the (+, —, +) block is not unique. 
For example, suppose that one chooses to change the sign of variable 4. Then, 
by a similar process of reasoning, one finds that the signs of 1, 7, and 11 must 
also change. Further, if simultaneous changes in the signs of 3, 4, and 10 are 
chosen as the starting point, the sign of 8 must be changed to retain the positive 
sign on B, and B; ; these changes require the further change of the signs of 2 and 
7 to maintain the signs of the fractionating generators. 

Therefore, to obtain the (+, —, +) block from the principal block, the re- 
quired changes in sign involve the following variables: 


ken 
a 
@23144 4 & * 


Choose the second arrangement. Examine the X’X sub-matrix for the effects 
confounded with variable 8. Since an X’X matrix must be symmetrical, only 
the upper half need be represented. 





REDUCED DESIGNS OF RESOLUTION FIVE 
8 27 16 35 411 910 
vi =~) <i +2 42. #1 
+l +1 =i =i —3 

+1 -1 =-1 <-!1 

+i +5 +1 

+i +i 

+1 


This sub-matrix is simply obtained from the principal block by: first, apply 
the indicated sign changes in row one; second, in the second row, multiply all 
subsequent values in the new row one by the —1 for effect 2 7; third, in the third 
row, multiply all subsequent values in row one by the —1 for effect 1 6; etc. 

(Note that the identical sub-matrix is obtained using any of the suggested 
combinations of sign changes. However, it is essential in the other two cases 
to realize that the diagonal elements are necessarily positive. Thus, all signs 
of the first row must be changed if the sign under 8 initially shows —1.) 

Now composite the results for the first two blocks by matrix addition: 


8 27 16 35 411 910 
2 0 0 2 2 
2 2 


0 
2 0 
2 
2 


2 


This matrix is obviously singular, representing an attempt to solve for six 
unknowns in 2 equations. 

The procedure may be followed for any number of blocks. The sub-matrices 
in this paper were investigated in this manner. 
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Irregular Fractions of the 2" Factorial 
Experiments’ 


SmNEY ADDELMAN” 


Iowa State University 


The development of fractional replicate plans which consist of irregular fractions 
of factorial experiments is presented. The method of constructing irregular fraction 
plans is developed for the general s" factorial experiment. However, the plans which 
are found to be of greatest practical value are plans for the 3/2” replicate of the 2” 
experiment. Although these plans introduce correlations between some of the esti- 
mates, the correlations do not affect individual tests on the parameters. Some irregular 
fraction plans permit the estimation of main effects and two-factor interactions with 
fewer trials than is required with an orthogonal plan. A straightforward procedure 
for obtaining estimates of the main effects and “important” interactions and estimates 


of their variances, covariances and correlation coefficients is presented for the 3/2™ 
replicate plans. 


INTRODUCTION 


There are many situations in which an experimenter must estimate the im- 
portant effects and interactions of a symmetrical factorial experiment with 
as few trials as possible. There are instances where, say, all two-factor interactions 
are important and a fractional replicate plan which permits orthogonal estimates 
of all main effects and two-factor interactions requires more trials than one can 
afford to make. If the experimenter is restricted to plans of the type represented 
by a 1/s” replicate of the 2" experiment, he then must either abandon the in- 
vestigation or choose a more highly fractionated plan and assume that some 
of the two-factor interactions are negligible. 

Consider, for example, a situation where it is desirable to estimate the main 
effects and two-factor interactions in an experiment involving seven factors, 
each having two levels, and no more than 50 trials can be made. It is known 
that a 4 replicate of the 2’ experiment defined by the identity relationship 
I = X, where J is written in place of the mean, u, and X contains at least five 
factors, allows uncorrelated estimates of all main effects and two-factor inter- 
actions, when three-factor and higher order interactions are negligible. How- 
ever, this plan requires 64 trials, exceeding the maximum number which can 
be made. It is also known that there does not exist a } replicate of the 2’ ex- 


periment which allows uncorrelated estimates of all main effects and two-factor 
interactions. 


‘This research was supported by the United States Air Force under Contract No. AF 


33(616)-5599, monitored by the Aeronautical Research Laboratory, Wright Air Development 
Division. 


* Now at the Research Triangle Institute. 





480 SIDNEY ADDELMAN 


The 3 replicate plan requires 64 trials and the } replicate plan requires 32 
trials. It then seems reasonable to inquire whether a plan can be constructed 
with 48 trials which will yield information on all main effects and two-factor 
interactions. The consideration of all possible subsets of 48 treatment combina- 
tions from the totality of 128 possible combinations in the 2’ experiment, for 
example, is a tedious taks. Therefore, an investigation has been made of the 
use of subsets which consist of a number, k, of the possible s” distinct 1/s” 
fractional replicates defined by an identity relationship. Such a subset of the 
total possible treatment combinations will be called an irregular fraction, if 
k # s‘, where ¢ is a positive integer. If k = s‘, one would prefer a 1/s”~‘ regular 
fraction, so this case is excluded. 


DEVELOPMENT OF IRREGULAR FRACTION PLANS 


The treatment combinations of the s” experiment may be represented as 
points of an n-dimensional lattice with axes x, , 2; , --* , %, . Each axis of the 
lattice will have s points, these points being the elements of the Galois field 
GF(s). Thus, for example, the treatment combinations of the 2” factorial ex- 
periment can be represented by the points (0, 0), (1, 0), (0, 1) and (1. 1). The 
two factors may be denoted by A and B. The points (0, 0) and (1, 1) both 
satisfy the equation 7, + x. = 0 (modulo 2) and comprise a 3 replicate of the 2° 
experiment defined by the identity relationship J = AB. The symbol AB, can 
be used to denote the set of treatment combinations for which x, + x, = 0 
(mod 2). Similarly the symbol AB, denotes the set of treatment combinations 
for which xz, + x. = 1 (mod 2). 

Consider n factors, A, B, C, --- , L, each with s levels. Then ABC --- L; 


denotes the set of treatment combinations for which x, + 22 + 2%, + +--+: +2, = 7, 
where 7 is an element of the Galois field GF(s). A subscript 7 can be associated 
with every effect or interaction of the identity relationship. For example, the 
treatment combinations of the four possible } replicates of the 2° experiment 
defined by the identity relationship 


I = ABC = ADE = BCDE 

are those treatment combinations which satisfy 
(1) 2, +2. +2; = 0(mod2), 2x, +2, +25 = 0 (mod 2), 

Lo + 2, + a, + 25 = 0 (mod 2) 
(2) 2, +2. +23; = 0(mod 2), x, +2, +2; = 1 (mod 2), 

Lo + ts + ty + 25 = 1 (mod 2) 
(3) a2, + 22+ 23 = 1(mod2), 2, + 2, +25 = 0 (mod 2), 

Lo + ty +t, + 2X5 = 1 (mod 2) 
(4) 2, +2.+23; = 1(mod2), 2x, +2, +2; = 1 (mod 2), 


tp + 23 + 2, + 5 = 0 (mod 2) 
respectively. 
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These four } replicates can be represented by the four relationships 


I = ABC, = ADE, = BCDE, 
I = ABC, = ADE, = BCDE, 
I = ABC, = ADE, = BCDE, 
I = ABC, = ADE, = BCDE, 
or can be displayed in tabular form as in Table 1. Since the interaction BCDE 
TABLE 1 


Structure of the 1/4 Replicates of a 25 Experiment 


Identity relationship Fractional replicate 





I 
ABC 
ADE 

BCDE 





is the generalized interaction of ABC and ADE, the subscripts associated with 
BCDE in the four fractional replicates may be obtained as the sum (mod 2) 
of the corresponding subscripts of ABC and ADE. 

The treatment combinations in each of the four possible } replicates defined 
in Table 1 are presented in Table 2. 


TABLE 2 


Treatment Combinations of the 1/4 Replicates of a 25 
Experiment Defined By I = ABC = ADE = BCDE 


Fractional replicate 


00000 
00011 
01100 
01111 
10101 
10110 
11001 
11010 abde 


00001 
00010 
01101 
01110 
10100 
10111 
11000 
11011 


00100 
00111 
01000 
01011 
10001 
10010 
11101 
11110 


ioe boo d ob 

| 

onouud ood dg 
ll 


abd 


For a more detailed account of the representation of treatment combinations 
as points in n-dimensional space the reader is referred to section 16.2 of 
Kempthorne (1952). 

The treatment combinations which constitute a 3 replicate plan for the 2° 
experiment may be obtained by selecting the treatment combinations which 
occur in any three of the four } replicates given in Table 1. 
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If each of the k subscripts associated with an effect or interaction in the 
identity relationship defining a k/s” replicate of the s* experiment are identical! 
then that effect or interaction is completely confounded with the mean, u. 
If the k subscripts are not identical then that effect or interaction is partially 
confounded with the mean. To illustrate this concept we present a ? replicate 
plan for the 2* experiment. The treatment combinations may be determined 
from the three fractional replicates presented in Table 3. 


TABLE 3 
Structure of a 3/8 Replicate of a 2* Experiment 


Identity relationship Fractional replicate 
1 2 3 


The use of the three distinct $} replicates requires testing the following 6 
treatment combinations: b, ad, cd, abc, bd, a. Interpretation of the experimental 
results depends, of course, on the true values for each combination. In Table 4, 
each of the 6 true values is expressed in terms of effects and interactions (omitting 
the usual multiplier of $ for all terms except u), so that the true value for 
isu—-A+B-—AB—C+ AC ete. 

It is clear from Table 4 that any function of the yields which estimates the 
ABC interaction also estimates the mean, u. Thus, the ABC interaction is 
completely confounded with the mean. In the same way, A is completely con- 
founded with BC, B with AC and C with AB. If the coefficients corresponding 
to the A effect are compared with those corresponding to the B effect it is found 
that the coefficients of the treatment combinations in two of the 4 replicates 
are opposites and in the third are identical. This fact indicates that A is partially 
confounded with B and AB is partially confounded with u. 

The following theorem, which can be easily verified, is helpful in the con- 
struction of irregular fraction plans. 


Theorem 1: In a k/s”™ replicate plan for the s” factorial experiment, no main 
effect or interaction need be completely confounded with the mean if k > (m + 1). 


Corollary: In a k/s™ replicate plan for the s” experiment, if k = (m — u), 
where u = 0, 1, 2, --+ , then (u + 1) effects and/or interactions and their general- 
ized interactions will be completely confounded with the mean. 


If k = (m — u), it is often possible to construct an irregular fraction plan 
so that the interactions which are completely confounded with the mean will 
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TABLE 4 


True Values of Treatment Combinations in Terms of 
Effects and Interactions 


ad cd 


contain at least five factors. This can always be arranged if there exists a 1/s*** 
replicate of the s” experiment which permits uncorrelated estimates of the main 
effects and all two-factor interactions, when higher order interactions are negli- 
gible. If it is not possible to have only five-factor or higher order interactions 
completely confounded with the mean, then some two-factor interactions will 
not be estimable. 


The use of irregular fraction plans is most effective with factors having two 
levels. The yield of a treatment combination in a k/2” replicate plan for the 2" 
experiment can be written in terms of the main effects and interactions: 


Yi... =~ BAA +4+4B4+4AB434C+434AC, ete. + error 
where the sign 
on A is — ifi = Oand + if7 
on B is — if j = Oand + if j 
on C is — ifk = Oand+ifk 
and so on, 


and the sign on a term involving several letters is the product of the signs on 
the individual letters. 


Of the k subscripts associated with a member of the identity relationship, 
let t be 0 and (k — t) be 1. The following theorem can easily be verified: 


Theorem 2: If an interaction in the identity relationship defining a k/2” repli- 
cate of the 2" experiment has an odd number of factors, the off-diagonal elements 
of the information matrix which are associated with that interaction are equal to 
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(k — 2t)2*””. If an interaction in the identity relationship has an even number 
of factors the off-diagonal elements of the information matrix associated with 
that interaction are equal to —(k — 2¢)2”™”. 


We shall adopt the rule that an odd-factor interaction will have an odd 
number of the k subscripts associated with it equal to 1 and an even-factor 
interaction will have an even number of the k subscripts associated with it 
equal to 1. With this rule the off-diagonal elements of the information matrix 
associated with each member of the identity relationship, for which the absolute 
value of (K — 2t) is constant, will have the same value. 

If we also adopt the procedure of grouping the effects and interactions of 
interest in such a way that those which are partially confounded with each 
other are contiguous then the information matrix will consist of various sized 
blocks about the main diagonal with off-diagonal blocks of zero elements. The 
variance-covariance matrix can be then be obtained by inverting each of the 
blocks about the diagonal separately. 

As an illustration of an irregular fraction plan consider a 2 replicate of the 2‘ 
experiment. Since k = 3 and m = 2, no interaction need be completely con- 
founded with the mean. The three fractional replicates defined by the identity 
relationship J = ABC = ABD = CD are represented in Table 5. The twelve 


TABLE 5 
Structure of a 3/4 Replicate of a 24 Experiment 





Identity relationship Fractional replicate 
1 2 3 


I 
ABC 
ABD 

cD 


treatment combinations which comprise the ? replicate plan are:0000,01 11, 
1011,1100,0001,0110,1010,1101,0010,9101,1001and1110. 
These treatment combinations can be represented by (1), bed, acd, ab, d, be, 
ac, abd, c, bd, ad and abc, where (1) denotes the treatment combination with all 
factors at the 0 level and the absence of a letter in any treatment combination 
denotes the corresponding factor occurs at the 0 level and the presence of a 
letter in the treatment combination denotes the corresponding factor occurs 
at the 1 level. 

If the three and four-factor interactions are negligible the yields of the treat- 
ment combinations may be expressed as 


Yim = p+3CD+34A + 3BC + 3BD + 4B 4 34AC4+34AD2 3C 
+ 4D + 34AB + error. 
This relationship may be written in matrix notation as 
y=X6+e 
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where 


[ (1) 
bed 


=} 
=] 
-] 
=|] 


=~] 
] —} 
1 =] 
1 











—l 1 —1l 
The information matrix is given by 


12 —4 
—4 12 


13 —4 —4 
—4 12 -—4 
L ~4Z@ —4 12 





It should be noted that the information matrix can be determined directly 
from Table 5 using the result of Theorem 2. 

In a 3/2” replicate plan of the 2” experiment the blocks which lie on the 
diagonal of the information matrix are of the form 2”-”**I — 2”-”J, where I is the 


identity matrix of rank p and J is a p X p matrix of 1’s. It is easily verified that 
the inverse matrix of 


[al + bJ] 
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is given by 
1 b 
a 1 at bp! | 
Hence the variance covariance matrix for the 3/2” replicate plan consists of 
blocks on the main diagonal of the form 


1 1 
op-at3 r+, 131. 
When p = 2 the block on the diagonal of the variance covariance matrix is 


sons (I + J) =? / 
is 


and when p = 3 the block is given by 
211 
1 1 
prema ll + J) = Sms 121)}- 
112) 
The variance covariance matrix for the ? replicate of the 2‘ experiment is thus 
3 1 
13 
422 
242 
1 224 
32 422 
242 
224 


(X’X)* = 


422 
242 
224 


The estimates are given by 8 = (X’X)~'X’y and may be presented in the following 
form: 


32a = 3T + [CD] 
16CD = T + 3[CD] 
8A = 2[A] + [BC] + [BD] 
8BC = [A] + 2(BC] + [BD] 
SBD = [A] + [BC] + 2[BD] 
8B = 2(B] + [AC] + [AD] 





wher 
sum 
[BC] 
posit 


varia 


var ( 
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SAC = [B] + 2[AC] + [AD] 
8AD = [B] + [AC] + 2[AD] 
8C = 2[C] + [D] + [AB] 
8D = [C] + 2[D] + [AB] 

- 84B = [C] + [D] + 2[AB] 


where 7’ denotes the sum of all the treatment combinations, [A] denotes the 
sum of the treatment combinations containing a, minus those not containing a, 
[BC] denotes the sum of the combinations whose expectations contain BC 


positively, minus the sum of the combinations whose expectations contain BC 
negatively, and so on. 


The variances and covariances of the estimates are easily obtained from the 
variance-covariance matrix, namely: 


var (A) = var (B) = var (C) = var (D) 
var (AB) = var (AC) = var (AD) = var (BC) = var (BD) =o¢/2 
var (CD) = var (2) = 307/8 
cov (A, BC) = cov (A, BD) = COV (BC, BD) = cov (B, AC) = cov (B, AD) 
= COV (AC, AD) = cov (C, D) = cov (C, AB) = cov (D, AB) =o/4 
cov (2A, CD) =o¢'/8 


all unmentioned covariances being zero. 


The estimates of the 3/2” replicate plan for the 2" experiment can easily 
be shown to have the following forms: 


(i) If X, Y and Z are effects and/or interactions which are partially con- 
founded with one another, the estimates of X, Y and Z are given by 


x 21 1 [X] 
fr =n a 
Z L1 1 21[Z] 
var (X) = var (Y) = var (Z) = o/2""" 
cov (X, Y) = cov (X, Z) = cov (YL, Z) = o°/2" 


(ii) If u, X and Y are partially confounded with one another the estimates 
of u, X and Y are given by 


2A 211) 7 
X| = yet 1211] [X] 
Y 11 2) [Y] 
var (X) = var (Y) = o°/2"""" 
cov (2a, X) = cov (2a, Y) = cov (X, Y) = a?/2"™ 
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(iii) If X and Y are effects and/or interactions which are partially confounded 
with each other and with no others, the estimates of X and Y are given by 


ple le 
P| 7 Li alley] 
var (X) = var (Y) = 307/2""*', cov (X, P) = 0° /2"""*". 


(iv) If » and X are partially confounded with each other and with no others, 
the estimates of » and X are given by 


fella 


var (2a) = var (X) = 30/2"""*!, cov (28, X) = 0? /2""""” 


A 3 replicate of the 2* experiment will allow uncorrelated estimates of all 
main effects if the three and four-factor interactions are negligible, but will not 
allow unbiased estimation of all two-factor interactions. The variances of the 
main effect estimates are equal to o”/2. Thus the 2 replicate plan permits esti- 
mates of all main effects and two-factor interactions, the main effect variances 
being at least as small as those obtained from a } replicate plan. However the 3 
replicate plan has introduced correlations between some of the estimates of up 
to 3. Although these correlations make joint tests of significance somewhat 
tedious, they do not affect individual tests on the parameters. In the exploratory 
situations, in which plans of the type discussed in this paper would be used, 
individual tests are the main concern. 

It has become evident that irregular fraction plans often permit the esti- 
mation of some interactions which must be assumed negligible in a comparable 
orthogonal fractional replicate plan. The interpretation of a highly fractionated 
experiment which permits, say, estimation of main effects under the assumption 
of no interactions can be misleading. It would therefore be informative to use 
an irregular fraction plan which permits some evaluation of the possible existence 
of interactions. If this evaluation gives no strong indication of interactions, one 
would then proceed with estimation on the supposition that there are no 
interactions. 

There are several possible criteria by which one can measure the efficiency 
of irregular fraction plans. We shall adopt the criterion of measuring the efficiency 
of main effect estimates of irregular fraction plans when the estimable inter- 
actions are not assumed to be negligible, relative to the estimates of an orthogonal 
main-effect plan with the same number of trials. 

For the ? replicate of the 2* experiment the efficiency can be defined as the 
ratio of the variance of a main effect estimate in an orthogonal main-effect 
plan with 12 trials to the geometric mean of the main effect variances of the 
irregular fraction plan, assuming that the two-factor interactions are not negli- 
gible. This efficiency can be shown to be «7/3 X 2/o” = 2/3. 

An investigation of the properties of the k/2” replicate plan for the 2" factorial 
experiment revealed that the most useful irregular fraction plans are those for 
which k = 3. When k > 3 (k odd) the plans are either inefficient or there exist 
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plans with fewer than k2”"” trials which permit uncorrelated estimates of the 
desired effects and interactions. 

The variance-covariance matrix for a 3/2” replicate of the 2" experiment con- 
sists of blocks on the main diagonal of the form 


1 1 
ame E + i-s 1 

where p is the size of the block. It is evident from the form of the variance- 
covariance matrix that if for any block in the information matrix, p = 4, then 
that block must be singular, so that an inverse of that block does not exist. In 
such a situation one must assume that one of the four correlated effects or 
interactions, which yield the singular block, is negligible and hence reduce the 
size of the block to three. 

Consider the irregular fraction plan for a 3 replicate of the 2° experiment 
defined by the identity relationship J = AB = AC = BC and given in the 
Table 6. It is clear from Table 6 that », AB, AC and BC are partially confounded 


TABLE 6 
Structure of a 3/4 Relicate of a 2? Experiment 


Identity relationship Fractional replicate 
3 


I 
AB 
AC 
BC 


with each other. The block in the information matrix which is related to these 
terms is 


6 —2 -2 -—2 
—-2 6-2 -—2 
—-2-2 6-2 
-2-2=-2 64 


This matrix is singular and therefore cannot be inverted. However, if one of the 
two-factor interactions, say BC, is assumed to be negligible then the mean uy, 
and the interactions AB and AC can be estimated. 

The variance of any main effect or two-factor interaction estimate is equal to 


(5 = p)o* 
C-<_ 


When p = 8 the variance of an estimated effect or interaction is o°/2"""™"’, 
when p = 2 the variance is equal to 30°/2”."** and when p = 1 the variance 
is equal to o”/3.2"""~*. The largest variance is obtained when p = 3. Now, the 
variance of a main effect estimate in an orthogonal main-effect plan with 3.2"™" 
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trials is o°/3.2""""*. Hence the smallest efficiency possible with a 3/2” replicate 
plan of the 2” experiment is equal to 3. 

It is easily verified that the correlation of the estimates in a 3/2” replicate plan 
of the 2” experiment which are partially confounded with each other is equal to 
1/(5 — p). Hence, if p = 2 the correlation is equal to } and if p = 3 the correla- 
tion is equal to }. 

There are various ways of estimating the error variance, o*. An estimate of 
o can be derived from the sum of squares of deviations about the estimated 
values if the number of trials exceeds the number of parameters estimated. 
This estimate can be calculated from the analysis of variance table by taking 
the difference between the total sum of squares and the sum of squares due to 
the parameters and dividing by the number of degrees of freedom associated 
with that difference. 

If the number of trials equals the number of parameters the error variance 
cannot be estimated by least squares procedures. The error variance may have 
been estimated in previous experiments on the same type of data and under 
some circumstances that estimate may be quite adequate for the present experi- 
ment. Another method of analysis is available to the experimenter. If the esti- 
mates of the effects and interactions are arranged in decreasing order of absolute 
magnitude and plotted on half-normal probability paper one can use a procedure 
developed by Daniel (1959) called the method of “half normal plots”’ to esti- 
mate the error variance and to make (somewhat subjective) judgments about 
the reality of observed effects. Although this method was not introduced as a 
general substitute for the analysis of variance its use may prove very helpful 
in making rough evaluations of the existence of effects and interactions. 


INDEX OF IRREGULAR FRACTION PLANS 


The index presented in Table 7 gives some of the more useful 3/2” replicate 
plans for the 2” experiment. For each plan the interactions which generate the 
elements of a suggested identity relationship are also presented. 
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TABLE 7 
Index of Useful Irregular Fraction Plans 


Experiment Fraction Number of Generators of 
trials identity relationship 


23 3/4 6 AB, AC 

24 3/4 12 ABC, ABD 

28 3/4 24 ABC, ADE 

2s 3/4 48 ABCD, ABEF 

26 3/8 2 ABCDE, ABF, AE 
2 3/8 48 ABCDE, ABF, AEG 
2s 3/16 48 ABCDE, ABFGH, ACF, BEG 
29 3/16 96 ABCDE, ABFGH, ACF, BEGJ 
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APPENDIX 


This appendix will be devoted to examples of 3/2” replicate plans of the 2" 
factorial experiment. Each plan is represented by the tabular form of the identity 
relationship and the subscripts associated with the interactions of the identity 
relationship in each of the three (})” factorial replicates. The efficiency of each 
plan is given and the confoundings indicated. Utilizing the forms of the estimates 
given in the body of the paper one can quickly determine the estimates of the 
mean, effects and interactions of interest and their variances and covariances. 


1. 3 replicate of a 2°: 


TaBLe 1A 
Structure of a 3/4 Replicate of a 2? Experiment 


Identity relationship Fractional replicate 





Assumptions: BC and ABC negligible. 
Partial Confoundings: », AB and AC; A, B and C. 
Precision: 
var (A) = var (B) = var (C) = var (AB) = var (AC) =¢ 
cov (2, AB) = cov (22, AC) = cov (AB, AC) = cov (A, B) 
= cov (A, €) = cov (B, C) = o’/2, 


all other covariances being equal to zero. 
Efficiency: 3 
If, in fact, the BC interaction is not negligible, then 


2a-BC)] [211] T] 

ap — Be| = 3)1 21) [4B] 

a ~ ge) |rie2lpac 
2. 3 replicate of a 2*: 


This plan has already been discussed in detail. 
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3. 2 replicate of a 2°: 


TABLE 2A 
Structure of A 3/4 Relicate of A 2° Experiment 


Identity relationship Fractional replicate 
1 2 3 


I 
ABC 
ADE 

BCDE 


Assumptions: ABD, ACD, four and five-factor interactions negligible. Partial 
confoundings: », ABC and ADE; A, BC and DE; B, AC and CDE;C, AB 
and BDE; D, AE and BCE; E, AD and BCD; BD, ABE and CE; CD, ACE 
and BE. 
Precision: All variances equal to o”/4 and all covariances that are not equal 
to zero are equal to o°/8. 
Efficiency: §. 
If, in fact, the interactions ABD and ACD are not negligible, then 


CD — ABD 211] [cD] 


ACE — ABD| = 3|1 2 11|| [ACE] 
BE — ABD 11 2IL [BE] 


BD — ACD 211] [BD] 
ABE — ACD| = %|1 2 1|| [ABE] 
CE — ACD 112 [cE 


4. 2 replicate of a 2°: 


TABLE 3A 
Structure of a 3/4 Replicate of a 2° Experiment 


Identity relationship Fractional replicate 
3 


Assumptions: ACE, ACF and all four, five and six-factor interactions 
negligible. 
Partial confoundings: A, BCD and BEF; B, ACD and AEF; C, ABD and DEF; 
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D, ABC and CEF; E, ABF and CDF; F, ABE and CDE; AB, CD and EF; 
AC and BD; AD and BC; AE and BF; AF and BE; CE and DF; CF and DE; 
BDE, BCF and ADF; BDF, BCE and ADE. 

Precision: 


var (A) = var (BCD) = var (BEF) = var (B) = var (ACD) = var (AEF) 
= var (C) = var (ABD) = var (DEF) = var (D) = var (ABC) 
var (CEF) = var (£) = var (ABF) = var (CDF) = var (F) 
var (ABE) = var (CDE) var (AB) = var (CD) = var (EF) 
var (BDE) = var (BCF ) = var (ADF) = var (BDF) 
= var (BCE) = var (ADE) = 0/8. 
var (AC) = var (BD) = var (AD) = var (BC) = var (AE) = var (BF) 
var (AF) = var (BE) = var (CE) = var (DF) 
var (CF) = var (DE) = 30° /32. 
The covariances of the effects and interactions which are partially confounded 
in sets of three are all equal to o’/16. The covariances of the two-factor 
interactions which are partially confounded in sets of two are all equal to o”/32. 
Efficiency: 3. 


5. 2 replicate of a 2°: 


TABLE 4A 
Structure of a 3/8 Replicate of a 2° Experiment 


Identity relationship Fractional replicate 
1 2 3 


I 
ABCDE 
ABF 

CDEF 
AE 
BCD 
BEF 
ACDF 


Assumptions: EHF, three-factor and higher order interactions negligible. 
Partial Confoundings: 24 and AE; A, E and BF; B, AF and CD, C and BD; 
D and BC; F, AB and BE; AC, CE and DF; CF, AD and DE. 
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Precision: 


var (2) = var (AE) = var (C) = var (BD) = var (D) = var (BC) = 30°/16 


var (A) = var (£) = var (BF) = var (B) = var (AF) = var (CD) 
var (F) = var ( AB) = var (BE) = var (AC) = var (CE) 


var (DF) = var (CF) = var (AD) = var (DE) = o ‘4. 


The covariances of the effects and interactions which are partially confounded 

in sets of two are all equal to o”/16. The covariances of effects and inter- 

actions which are partially confounded in sets of three are all equal to o’/8. 
Efficiency: 0.73. 

If, in fact, EF is not negligible then 


[ B-—EF] [2 


AF — EF | = 3} 1 
|CD — EF 4 


6. 2 replicate of a 2’: 


TaBLe 5A 
Structure of a 3/8 Replicate of a 27? Experiment 





Identity relationship Fractoinal replicate 
1 2 3 





Assumptions: All three-factor and higher order interactions negligible. 
Partial Confoundings: A, BF and EG; B and AF; E and AG; F and AB; G and 
AE; BC and DG; BD and CG; BE and FG; BG, CD and EF; CE and DF; CF 
and DE. 


Precision: 


var (28) = var (C) = var (D) = var (AC) = var (AD) = ¢'/12 
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var (A) = var (BF) var (EG) = var (BG) = var (CD) = var (EF) =¢/8 
var (B) = var (AF) = var (£) = var (AG) = var (Ff) = var (AB) = var (G@) 

var (AE) = var (BC) = var (DG) = var (BD) = var (CG) = var (BE) 


var (FG) = var (CE) = var (DF) = var (CF) = var (DE) = 30°/32. 


The covariances of the effects and interactions which are partially confounded 
in sets of three are all equal to o’/16. The covariances of the effects and 
interactions which are partially confounded in sets of two are all equal to o”/32. 


Efficiency: 0.88. 
7. 3/16 replicate of a 2°: 


TaBLe 6A 
Structure of a 3/16 Replicate of a 28 Experiment 








Identity relationship Fractional replicate 
1 2 3 





1 1 1 
1 1 1 
0 0 0 
0 0 1 
1 1 0 
1 1 0 
0 0 1 
0 1 0 
1 0 1 
1 0 1 
0 1 0 
0 1 1 
1 0 0 
1 0 0 
0 1 1 





Assumptions: All three-factor and higher order interactions negligible. 

Partial Confoundings: A and CF; B and EG; C, AF and EH; D and FG; E, 
BG and CH; F, AC and DG; G, BE and DF; H and CE; AB and DH; AD, CG 
and BH; AE and FH; AG and CD; BC and GH; BD, EF and AH; BF and DE. 

Precision: The variances and covariances of all effects and interactions which 
are partially confounded in sets of two are equal to 30°/32 and o*/32, re- 
spectively. The variances and covariances of all effects and interactions which 
are partially confounded in sets of three are equal to o°/8 and o°/16 
respectively. 

Efficiency: 0.77. 
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8. 3/16 replicate of a 2°: 


TaBLe 7A 
Structure of a 3/16 Replicate of a 2° Experiment 


Identity relationship Fractional replicate 
1 2 3 


COrrocoOorrococorrFK COO Ke 
KF oCcorroOororr COOK 
Orr Or COrrFK COCK CO Ke 


Assumptions: All three-factor and higher order interactions negligible. 

Partial Confoundings: A and CF; C and AF; F and AC; BC and GH; BD and EF; 
BE, DF and GJ; BF and DE; BG, CH and EJ; BH and CG; CE and HJ; 
CG and BH; DG and FJ; CJ and EH; DJ and FG. 

Precision: The variance of each effect and interaction which is uncorrelated 
with any other effect or interaction is equal to o”/24. The variances and co- 
variances of the effects and interactions which are partially confounded in 
sets of two are equal to 307/64 and o°/64, respectively. The variances and 
covariances of all effects and interactions which are partially confounded in 
sets of three are equal to o”/16 and o”/32, respectively. 

Efficiency: 0.96. 
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A General Simulation Programme for Material 
Flow in Batch Chemical Plants 


J. Dyson, P. L. GotpsmitH Anp J. S. M. RoBertson 


Imperial Chemical Industries Limited, Fibres Division, Harrogate, Yorkshire, England 


Techniques for the simulation of material flow through multi-stage full-scale 
batch chemical plants are described and the need for general computer programmes 
is argued. An outline is given of a general simulation programme written in ‘Mercury’ 
auto-code, which is based on the reduction of interstage pipe networks to a small 
number of ‘“‘common mains.” 


1. INTRODUCTION 


With the advent of electronic computers, there has been a considerable increase 
in the number of simulation studies of complex industrial plants. Formerly, 
the labour of hand computation precluded the investigation of any but the 
simplest systems and often restricted the use of the simulation model to a single 
plant configuration. 

The steel industry has been particularly active in such studies with simula- 
tions of the port unloading arrangements for iron ore, of output variations and 
furnace delays likely to result from changes in the operating policy and equip- 
ment of a melting shop, and so on. Large hydro-electric systems have also been 
simulated on an electronic computer, facilitating the study of alternative operat- 
ing procedures and the evaluation of the overall gain in power production re- 
sulting from possible modifications. 

Some of the possibilities and techniques to be used in such simulation studies 
are discussed in reference (2). Most workers in this field appear to favour the 
use of digital rather than analogue machines, partly on account of the avail- 
ability of the former. Also, the general purpose nature of the digital machine 
permits the simulation of a variety of plants, the random elements in the model 
being supplied by an internal generation of pseudo-random deviates. 

In this paper we consider the simulation of full-scale chemical plants, and in 
particular the movement of materials through a batchwise operated plant. 
It is assumed that changes of productivity do not affect the quality of the product, 
but in practice this factor must be carefully watched. 


2. SIMULATION IN THE CHEMICAL INDUSTRY 


In the chemical industry, simulation exercises can be an important step in 
increasing knowledge about plant behaviour. If mathematical equations can 
be set up to describe the physico-chemical behaviour of the actual chemical 
reactor, then it may be possible to solve these on a computer and hence learn 
how to operate that reactor more efficiently. Then again, a study of the flow 
of material through a full-scale plant can pick out the bottlenecks in the current 
arrangement of units, and the simulation model becomes a piece of experimental 
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apparatus on which to try out alternative arrangements. Youle (3) envisages 
a simulation model growing in parallel with the development of a new chemical 
process through the following five stages: Laboratory work, Semi-technical 
plant design, Semi-technical plant operation, Full-scale plant design, and Full- 
scale plant operation. k 

The present paper is concerned with the simulation of full-scale plants. 
Material is passed forward through the various stages of chemical processing, 
each stage consisting of a number of identical plant units. The statistical problem 
associated with such an investigation has the following two parts: 


(i) The determination of the distribution of variable factors such as batch 
cycle times and unit breakdown times: these can usually be obtained by 
a simple statistical analysis of data extracted from plant records. 

(ii) The generation of random samples from these distributions followed by 
numerical operations on the samples which correspond to the behaviour 
of the real plant. 


Chemical plants are built to run in either a continuous manner or with batches 
of material, although a mixture of these modes of operation may be employed. 
In continuous chemical plants, production is upset only by random breakdowns, 
periodic maintenance, or the shortage of raw material. The instantaneous 
capacity of the whole plant is the minimum of the capacities of its component 
stages. A simulation of a continuous plant therefore reduces to a numerical 
imitation of these rather rare events. Alternatively, good approximate solutions 
may be obtained by combining the breakdown probabilities of the separate 
stages. 

In contrast, to study the behaviour of a batchwise operated plant, each 
batch of material is considered as a separate item, together with its associated 
cycle times. It does not appear to be possible to derive analytical solutions to 
this problem except for the simplest plant configurations, and with the com- : 
plexity of real plants it is necessary to use a simulation approach to derive 
useful results. 

The principle adopted in simulating a batch plant on a digital computer is to 
generate and store a continuously changing activity chart for each of the plant 
units. The three main activities of a unit are:(i) Being charged with materials, 
(ii) Processing, and (iii) Being discharged of product. In addition, a vessel may: 
(iv) Wait empty, because of inadequate vessel capacity at the previous stage, 
(v) Wait empty, because of inadequate pipework connections from units in 
the previous stage—a state which is termed ‘temporary waiting empty’, (vi) 
Wait full, because of inadequate vessel capacity at the succeeding stage, or 
(vii) Wait full, because of inadequate pipework connections to units in the 
succeeding stage—termed ‘temporary waiting full’. The computer keeps a note 
of the time at which each activity will end, and scans through the list of times 
to pick out which will finish next, together with the associated unit. By consult- 
ing the logical analysis embodied in its programme, the computer advances the 
unit to the next activity and allocates a new completion time. The table of 
times is scanned again and the cycle is repeated. 

The application of this technique to a particular batch plant is described in (4). 
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The output of the plant and also the delays due to lack of capacity and cycle 
time variability are calculated. Then modifications are written into the simula- 
tion programme to permit the calculations to be repeated with changed mean 
cycle times, reduced variability in cycle times, and with an amended arrange- 
ment of plant units. 


3. NEED FOR GENERAL PROGRAMMES 


The programme for the above application was written in the auto-code lan- 
guage for a Ferranti ‘Mercury’ computer and occupied nine chapters, which is a 
little under half the available programme space. Writing this programme necessi- 
tated a complete system analysis of the possible transfers of material from stage to 
stage, followed by the coding of these operations for the machine. Having written 
the programme for one particular arrangement of plant units, adaptations for 
alternative configurations proved excessively tedious. Hence, the need arises 
for some general programme which will permit a wide variety of batch plants 
to be simulated without the labour of writing an individual programme in 
every case. A general programme would still be desirable, even if speeding up 
the programming stage is offset by a slight increase in running time on the 
computer. A similar requirement has arisen in the steel industry, where a general 
simulation programme is being prepared for the Ferranti ‘Pegasus’ computer (2). 

One way of satisfying this need would be to express the simulation problem 
in a plant-orientated simulation code which would then be automatically trans- 
lated into machine instructions by an interpretive programme (1). This approach 
has the advantage that the simulation programme can be written in terms 
easily understood by the process investigator and the plant manager, but con- 
siderable effort would be required to construct the translator. An alternative 
approach, which is followed in the paper, is to use the existing auto-code lan- 
guage, but to express the simulation programme in generalised terms. 

In some plants, the installed equipment and the balance between the various 
stages is such that the pipework connections are completely adequate and 
cause no hold-up in production. Therefore, a general programme can assume 
that transfers of material are always possible when there is an empty vessel 
in the succeeding stage. The parameters in the programme are simply the 
number of stages and the number of units in each. 

Often, however, pipework limitations will cause delays and must be taken 
into account in the simulation model. One way of doing this would be to regard 
each section of pipe as a plant ‘unit’ capable of the two activities ‘being used’ 
and ‘available for use’. However, this treatment is unnecessarily detailed and 
can be simplified by the useful trick of reducing pipe networks to a small number 
of ‘common mains’. This approach has yielded a programme which simulates 
the plant described in (4) in only four chapters of ‘Mercury’ auto-code, and is 
at the same time a general programme which can be used for all batch plants 
of this type. 


4. Tar Concept or ‘Common Marns’ 


Figure 1 shows a line diagram of another batch chemical plant. In this example, 
there are six distinct stages with either three or four units in a stage. A complex 
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STAGE 2 


STAGE 3 


STAGE 4 


ee. STAGE S$ 


Ea ea STAGE 6 


Figure 1—Line diagram of a typical plant. 
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interstage pipe network is shown, together with the installed valves. The piece 
of pipe between a unit and its first junction with other pipes must always be 
used for material transfers from the unit to which the pipe is joined, so that in 
an activity sense this pipe is an integral part of the unit. A similar consideration 
applies to the piece of pipe leading into a unit from its last junction with other 
pipes. The remainder of the pipework can be reduced to a small number of 
‘common mains’. A common main is defined as an aggregate of pipes such that 
the use of part or all of it for a material transfer between stages of the plant 
automatically prohibits the use of any part or the whole of the aggregate for 
any other simultaneous transfer. The idea is perhaps best understood by reference 
to Figure 2, which shows the application of this principle to three of the pipework 
systems appearing in Figure 1: common mains are denoted by thick lines and 
lettered a, b, ¢, etc. 

Having determined the common mains in the system, a list is constructed 
of those used in each of the possible transfers of material from units in one stage 
to units in the next. This information is then coded in a four digit number, 
a ‘l’ digit denoting that the corresponding common main is needed for the 
transfer and a ‘0’ digit that it is not required. For example, the table below 
refers to the third system depicted in Figure 2. 


Common Mains Employed in Material Transfers of Example 3 in Figure 2 








Common mains code (w) 
Possible transfers from Common mains = —————————— 


stage 5 to stage 6 required 


a 
a 
o 
=~ 


Unit 1 to Unit 1 


rwWNeK WHY & WDD 
ocorr CORP KR eR Re ee 
See et HH OOK HK OO 
cocooocoocococo 
cococoocoocoooocoooo 


In its current form the programme can cater for up to four common mains 
between any two stages, and this number has proved adequate for the quite 
complex systems found in practice. However, if it were deemed necessary, the 
programme could be enlarged to allow for higher numbers of common mains. 


5. SPECIFICATION OF THE ‘Mrrcury’ CoMPUTER PROGRAMME 


The flowsheet for the main programme, which is based on the method de- 
scribed in Section 2 above, is shown in Figure 3. It has been found in practice 
that batch cycle times usually have Normal distributions, which can therefore 
be specified in the input data by sets of mean values and standard deviations. 
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EXAMPLE | 


STAGE {| TO 
STAGE 2 


XAMPLE 2 


STAGE 3 TO 
STAGE 4 


EXAMPLE 3 


STAGE $ TO 
STAGE 6 


Figure 2—Examples of common mains. 
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cycle times. Samples from other distributions could, of course, be generated 


Hence it is sufficient to generate pseudo-random Normal deviates for the typical 
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machine in locations named wo , W; , We , *** Wize . Restricting the possible 
transfers to 180, limits the size of the plant which can be simulated to r stages 
of mn, , M2, °** N, units respectively, such that 


NNz + NNs + NN, + +++ +2,-N, < 180, 
and there is an additional limitation on the total number of units, namely, 
Mm +n tng t+ -e> +n, S 23. 


When there are less than 180 possible transfers, the remaining locations for 
the non-existent pipework are filled with zeros. By packing the details of more 
than one transfer into a single storage location, it should be possible to simulate 
even more extensive systems. 

When the computer follows through the main flowsheet (Figure 3), it will 
periodically conclude that a transfer of material is possible. It then holds the 
number of the unit whose activity is just ending in a location named 7, and its 
stage number in k. The dummy unit number 7 = 0 is used for printing out 
results. The flowsheet for transfers is shown in Figure 4. Location / is used to 
store the number of the unit to or from which a transfer can occur; the programme 
cycles through the possible values of / until it finds a unit in the required state, 
together with free connecting pipework. The common mains between stages k 
and (k + 1) which are being used currently are stored in the coded form in 
location 4-1) , and the availability of pipework for an additional transfer 
between these stages is tested by adding z.,-,) to the appropriate value of w. 


The appearance of a ‘2’ digit in an algebraic addition reveals the impossibility 
of a further transfer between stages k and (k + 1), while their non-appearance 
indicates that the transfer can go ahead. Alternatively, if the test is performed 
by the computer logical instruction AND, any ‘1’ digit in the resultant binary 
number indicates that no further transfer is possible. The plant of Figure 1 can 


be simulated by this programme at a rate of 100 batches in approximately 
three minutes. 


6. EXAMPLE 


As an example of the results obtained from the programme consider the plant 
of Figure 1 together with the following cycle times: 


Cycle Times for the Plant of Figure 1 


Cycle Time ( Min.) 
———_—____—————_ Discharge 
Mean _—_— Standard Deviation Time (Min.) 


152 22 
149 20 
206 15 
190 21 
150 18 
220 17 


meni aig ac NN A ee 





506 J. DYSON, P. L. GOLDSMITH AND J. S. M. ROBERTSON 


It is assumed that at no time is there a shortage of raw materials for Stage 1, 
and also that the product can be directly removed from Stage 6. 

The next table exhibits the plant operation as computed by the simulation 
programme during the second week of production, following a start-up with 
all vessels empty. 


Summary of Second Week of Production 


Waiting times Temporary Waiting 
Unit (Min.) times (Min.) Output Percentage 
—_ —_ Batches Utilization 
Stage Number Full Empty Full Empty made of Stage 
1 1530 ita 0 : 49 
1 2 1437 say : 49 84 
3 1962 = : 47 


526 48 
532 49 
781 


290 
421 
563 
547 


1013 
652 982 
929 819 
1423 953 
539 546 


687 471 
728 607 
544 


926 
914 


1227 


These figures are carefully compared with those obtained on the real plant 
to ensure that the simulation is faithful. The outstanding feature of this par- 
ticular set of results is the small amount of time lost due to inadequate pipework 
connections between the stages. As regards vessel capacity the stages are not 
too badly balanced. Stages 1 and 4 both suffer long waiting full times which 
reflect insufficient capacity at Stages 2 and 5: also units in Stage 6 wait empty 
for considerable periods. These conclusions are confirmed by the values of 
equipment utilization shown in the last column of the table and calculated as 
the average percentage of time that the stage was being usefully employed. 
Cycle time variability and carry-over from week to week cause some variations 
in the total weekly output figures, the average in this case being 146 batches 
with a standard deviation of 1 batch. 
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It is of interest to compare the above results with those obtained when the 
stage cycle times have the same values but are constant i.e. to discover the 
extent to which cycle time variability reduces production. We find that a 9% 
increase in productivity could be obtained from the pliant of Figure 1 if this 
variability were eliminated. 


7. CONCLUSIONS 


The general programme facilitates the simulation of full-scale batch chemical 
plants, making it easier to calculate the performance of the plant in its present 
state as regards: (i) Plant output, (ii) Unit occupation times, (iii) Delays due 
to inadequate pipework and (iv) Effects of cycle time variability. In addition, 
the computer programme can be used as a piece of experimental apparatus, 
so that the performance of the plant can be recalculated to show the effects of 
various modifications such as (v) Output when unit mean cycle times are reduced, 
(vi) Output when cycle time variability is reduced and (vii) Output with alter- 
native arrangements of plant units. Because experimentation on the real full- 
scale plant is likely to upset production, experimentation on its analogue, the 
computer programme, is a very valuable tool in industrial process improvement. 
Such information together with cost data will considerably aid the quantitative 
assessment of future plant policy decisions. 

The general simulation programme described in this paper has been written 
in ‘Mercury’ auto-code and has not required the detailed construction of a new 
translator. Its general nature permits the simulation of a wide variety of batch- 
wise operated full-scale plants as soon as the interstage pipe networks have 
been reduced to a small number of common mains. Furthermore, this programme 
can be used as a basis to which may be added the simulation of semi-continuous 
units, divided batches, storage vessels, plants with recycle and even the avail- 
ability of plant operators where their inefficient use is limiting production. In 
suitably modified forms, it may also be used to optimise the arrangement of 
plants units at the design stage. 

The authors acknowledge with thanks guidance received from Dr. P. V. Youle 
in the preparation of this paper. 
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Multiple isotope dilution is used when a compound C is present in a mixture in so 
minute a quantity that ordinary chemical extraction is difficult. If the compound in 
the mixture can be obtained in a radioactive state, the mixture is diluted with known 
amounts of inactive compound C. The relevant equations are S;(m +2;) = k, (i = 1, 
2, ..., p), where m is the original (unknown) amount of compound C present in each 
aliquot, x; is the amount of inactive compound C added to the 7-th aliquot, and S; is 
the specific activity of the diluted compound C extracted from the i-th aliquot; k is an 
unknown constant. 

Assuming that the error of determining S; is proportional to S;, a maximum like- 
lihood estimate m of m is derived. The dilutions x, 22, ..., Zp are determined such 
that var m is minimized. It is found that this takes place when the observations are 
divided evenly between two dilutions, one at the lowest possible level and the other at 
the highest possible level. Further, for this allocation, an estimate much easier to com- 
pute than the maximum likelihood estimate is found to have asymptotically almost 
minimum variance. 


1. INTRODUCTION 


Consider a mixture containing a compound C in such minute quantities that 
it would be difficult to extract it in a pure form. In such cases, the mixture can 
nevertheless be analysed successfully if the compound C is in a radioactive 
state. The proportion of compound C in the mixture is then increased by adding 
a comparatively large amount of inactive compound C, so that its partial ex- 
traction becomes possible. After thorough mixing, a portion of the diluted 
substance C is extracted and the original amount present in the mixture is 
determined by comparing the radioactivities before and after dilution. (When 
in the original mixture the compound C cannot be obtained in radioactive form, 
a method of dilution with active compound C can sometimes be applied. This 
is discussed in the appendix.) 

To derive the necessary equations, consider an aliquot (Q,) of the mixture, 
containing the unknown quantity m of radioactive compound C of specific 
activity s; (specific activity = total radioactivity per unit mass). If a known 
amount 2, of inactive compound C is added to the aliquot, the total radioactivity 
k of the compound remains unchanged, so that we have the equation 


k = sm = S,(m + 2), (1) 


where S, is the specific activity of the compound C in Q, after dilution. If z, 
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is large compared with m, the total amount of compound C in the aliquot may 
be large enough for a portion of it to be extracted in a pure state. The specific 
activity S, can then be measured and m can be calculated from equation (1), 
provided the original specific activity s is known. In many cases, however, s is 
unknown, so that a second equation is required to determine m. Such an equation 
can be obtained by a second experiment in which a different quantity x, is 
added to another aliquot (Q.) of the same mass as Q, , giving 


k = S,(m + 22). (2) 


When S, is measured, the two equations contain only the two unknowns k and 


m, so that m can be determined by solving the two simultaneous equations (1) 
and (2), viz: 


oa sXe a Sit 
—- (3) 


The method of double dilution was first introduced by Bloch and Anker 
(1948); a numerical example dealing with an actual experiment is given by 
Mayor and Collins (1951). In practice, one usually measures quantities pro- 
portional to the specific activities rather than the specific activities themselves. 
However, since in formula (3) and subsequent formulae the specific activities 
appear only as ratios, the results obtained in this paper continue to hold when 
S, and S, , and later the S;; , are understood to denote quantities proportional 
to the specific activities. 

But S, and S, can only be determined with a certain degree of accuracy. 
The question, therefore, arises whether repeated measurements with different 
dilutions x, , x, , --+ , 2, might lead to a more accurate determination of m. 
In particular, we would like to know what values chosen for 2, , %2 , °°: , % 
would lead to the most accurate value for m. The problem, which was suggested 
to the author by A. M. Downes, C.S.I.R.O., Prospect, may be stated as follows: 

A number n of equal aliquots, Q, , of the mixture containing the radio-active 
compound C are each diluted with the same amount 2, of inactive pure com- 
pound C. When equilibrium is reached between the active and inactive compound 
C, a portion of the pure compound is recovered from each aliquot and measure- 
ments S,, , Si. , --: , S,, of their specific activity are obtained. These measure- 
ments may be regarded as a sample from an infinite population (S,) of possible 
measurements whose mean S, represents the “true” specific activity of the 
diluted compound C. 

Next, n aliquots, Q. , equal in mass to the n aliquots Q, , are diluted with a 
quantity xz. of pure inactive compound C, resulting in the measurements 
S21, Soo, +++ , So, of the specific activity S, of the compound C diluted with «; . 

Continuing in this way with dilutions z; , --- , 2, , we obtain altogether p 
sets of n measurements of the (true) specific activities S, , S., --- , S,. We 
assume that the populations (S,), (S.), --- , (S,) are normal with standard 
deviations o, , 2 , -*: , ¢, proportional to the means S, , S: , --: , 5 
i.e. o; = AS; (¢ = 1, 2, --- , p), where A is independent of 7; . 

Instead of the two equations (1) and (2), we now have the p equations 


’ - 
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k , 
oe oe mt (¢ = 1, 2, ---, p) (4) 


where S; is the mean of a normal population (S;), and S;, , S;., --: , Si, isa 
sample selected from this population. Our aim is, to use the np measurements 
S,; (@ = 1,2, +++ ,p;9 = 1, 2, --- , m) to estimate m and to determine optimum 
values 2, , 22, *** , 2» such that the estimator of m has a minimum variance. 

This is a problem of optimal allocation for the estimation of a single param- 
eter m. Problems of optimal allocation have been investigated by various authors, 
in particular by Chernoff (1953) who deals with the general problem of estimat- 
ing s amongst k parameters when the true values of all parameters involved 
are approximately known. Our problem is a special case where s = 1 and where 
the optimal allocation happens to be independent of the true values of the 
parameters. Further references may be found in Kiefer and Wolfowitz (1959), 
who have approached the same problem using certain results of game theory. 


2. Tat Maximum LIKELIHOOD ESTIMATOR 72 


The j-th (j = 1, 2, --- , m) measurement S,,; of the specific activity S; is a 
normal variate of mean (4) and standard deviation 


where, to simplify the algebra, we use the parameter c = Nk instead of \. The 
variate S;; thus has the probability density 


-4 2; +m {-1( es ie y (zeny 
(2m) c — 2 Si: x;+m c 


depending on three independent parameters m, c, k whose maximum likelihood 
estimators are to be investigated. 

Since all S,;; are independent, the multivariate (S,; , S.; , --- , S,;) has the 
probability density 


fi a f (Si; ? So; eon S,;) 


= On) (oes + 09)" exp {3 5 (8 — Sy"/eth- 


=1 


Hence, using (4) and (5), the natural logarithm of the maximum likelihood 
function L = fif, +++ f, is 


nL = Fut 0 —ap ne +0 3 ne, + 00 


_ os De Le {8.i(e; + m) — k}? + const. 


The maximum likelihood estimators m, ¢, & satisfy the conditions 


_9inL_ din L _ 
am 
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which reduce to the equations 


én Dae + Wm) = DY Sul Sule. + mm) — fi 
np = D1 Do Sides + m)* — 26 DD Sie, + mh) + nphr (6) 


npk = Zz 2 S,;(z; + m). 
Putting 
23 = Six, + m) 
and substituting the last equation of (6) in the two others, we eliminate / and 
subsequently é. This gives 


LS ot my) = 2 de (Su = 8-)eu = 2.) 


*% : 7 

P imi pe Zz (25 — z..)° ”) 
where S.. and z.. stand for the arithmetic means of all the S;,; and z,; respectively. 
To bring this equation, which contains * as the only unknown quantity, into 
a more suitable form, let 


“ui; = Sait; ’ Ut; Uz — Use, Si; = Si: = Bes ; (8) 


where w.. is the arithmetic mean of all the u;; . This gives z,; — z.. = wi; + S',, 
and equation (7) becomes 


oo eg (9 
~ T+ Inv + mS ’ 


: 7 (x; + mm) 


where 


S=DLUSi, U=L Vw, Vs DV Sis. (10) 
For any given set of values x, , x, , --: , , and given measurements S,, , equa- 
tion (9) can be solved for mh. 
In the particular case where p = 2, 2, = a, 2, = b (a < b) (say), equation (9) 
reduces to the quadratic equation 


[(a + b)S — 2V]m? + 2(abS — U)mh + [2abV — (a + B)U] = 0. (11) 
In this important special case S, U, V can be further reduced; we obtain 
S= a Si; + ~ S2; — 3n(S,. + 8:2)’, 
Ve=a ee St; +b Zz S3; — 3n(S,. + S2.)(a8,. + bdS:.), 
U=a’ > Si; +b” > Si; — 4n(aS,, + bS,.)’, 
where S,, and S., are the arithmetic means of the measurements S,; and 
So; (j = 1,2, +++ ,n) atx = aandz = b respectively. Substituting these values 
in (11), the quadratic equation reduces to 
(Ri — R2)m? + 2(aR? — bR2)m + (a’Ri — WR?) = 0, (12) 
where 
Ri= DSi, — Si... = = 1, 2) (13) 
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The positive solution of equation (12) is 


_ bR, — ak, aie 
m = ok * (R, ,R. positive) 


which thus represents the maximum likelihood estimator of m. 


3. THE Minimum ASYMPTOTIC VARIANCE OF 1 


To determine the asymptotic (n — ©) variance of *, we follow the usual 
procedure (e.g. Mood, 1950, p 212). We calculate the second derivatives of 
In f; , introduced in Section 2, and determine their mathematical expectations. 
Using E(S,;) = k/(x; + m) and E(Sj;) = of + Si = (° + k*)/(@; + m)’, 
we obtain (writing f for f;) 


z) = (2 + k*/c)B’, 


where 


A=Da,+m", B= D@+m”. (15) 


The variance of mt is then given by var *t = o,,/n, where o;, is the first element 
of the inverse of the matrix [o*']; we find 


2 

c 
ne? + 22)(B — Ap) - 
To obtain a minimum variance, keeping n and p fixed, we have to choose the 


the values 2, , 22 , °** , 2» such that B? — A’/p is a maximum. Putting 
v; = (x, + m)~’, this expression becomes 


D=B — A*/p= de _ (= v) /p = > (vy; — v.)’. 


=1 


var h = 


Now suppose that the z; can be chosen anywhere in the closed interval [a, b], 
where 0 < a < b. Then »; can be chosen anywhere in [a, 6], where a = (b + m)™ 
and 6 = (a + m)~". To find the maximum of D, let v; , ve , «++ , Up-1 be fixed 
and v = v, variable. Then 


D=Q+"-(P+»)'*/, 
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p-1 


pu Fe,., Q= Vi 


i=1 i=1 
are constants. Now the derivative 


< = 2p — Iv — Pl/p 


is monotonely increasing, being negative for » < P/(p — 1) and positive for 
v > P/(p — 1). Since the expression v’ = P/(p — 1) is the arithmetic mean of 
the fixed values v, , v2 , -** , Vp-1 , its value lies, like these, between a and 8. 
Thus, D increases when v moves away from v’ and reaches it largest value when 
v is equal to a or 8. Since the same reasoning holds for all v; , D cannot reach its 
maximum unless all v; are either at a or 8. It is now easy to show that the maxi- 
mum is reached when the values are divided evenly between a and 8. For let r 
values be at a and p — r values at 8. Then 


D = ra’ + (p — n)B’ — {ra + (p — r)B}"/p 
= (6 — a)’r(p — r)/p. 


The function D of r is represented by a parabola reaching its maximum for 
r = 3p. Thus, when p is even, the largest value of D, and hence the minimum 
variance, is reached when half the observations are taken at + = a and the 
other half at x = b. When 7 is odd, say p = 2q + 1, the minimum variance may 
be obtained in two ways by taking q observations at a and q observations at b 
and the one remaining observation at either a or b. 

Regarding the choice of the constants a and b, it must be kept in mind that 
a must not be too small, for there would be insufficient compound in the mixture 
for pure recovery, and b must not be too large so that the specific activity S, 
does not become too small for accurate determination. For suitably chosen 
values of a and b, however, our assumption of a constant coefficient of variation 
\ seems reasonable. 


For p = 2, x, = a, x; = b, equation (16) reduces to 


2\*(a + m)*(b + m)* 


var th = “n(l + 2X°)(b — a)" (a7) 


An approximate value of var # can be obtained by substituting in (17) the 
maximum likelihood estimates of m and }’, but it is simpler (though possibly 
somewhat less accurate) to estimate m by 


i oT. ae aT, 
T; he Ts; ? 


obtained from (3) by replacing S; by S;, = T;/n , and \ by 


rao #), 


(18) 
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4. ESTIMATION OF m BY ™ 


Since the optimum design of placing all measurements at the two extreme 
values of z has been established, the more direct estimate m of m, based on 


equation (3) and defined by equation (18), may turn out to be almost as efficient 
as the maximum likelihood estimate 7. 
Consider the variates 


X=S8,.-—8:., Y = bS,. — a&,, (21) 
whose means, variances and covariance are 
E£,=8,-—8,, EF, = bS8,—aS8,, (22) 
a=(ci+0)/n, B= (a’oi + b’o2)/n, (23) 
vy = cov (X, Y) = —(ao; + bo>)/n. (24) 


The true value of m is then equal to E,/E, , and its estimate is m = Y/X. 
Following Paulson (1942), we consider the variate 


Z= Y — mX, (25) 
whose mean is zero and whose variance is 
var Z = ma — 2my + B. (26) 


Since.Z will be approximately normal, confidence limits for m can be obtained 
by choosing the number ¢ from normal tables such that 


P< varZ (27) 


can be stated with confidence P (eg. ¢ = 1.96 for P = 0.95). sub- 
stituting (25)—(26) in (27), we obtain the inequality 


(X? — fa)m? — AXY — fy)m+ (Y’ — #8) < 0. (28) 
The roots m, , m2 of this quadratic expression are usually real and represent the 
desired confidence limits for m. 


When the variances of , o; are unknown, the constants a, 8, y have to be 
estimated by means of the unbiased estimates 


= D(8,-8.)’/m@-1). @=1,2) (29) 


Instead of normal tables, we then use Student’s tables with n — 1 degrees of 
freedom (Paulson, 1942) to obtain the appropriate value of ¢ (e.g. t = 2.26 
forn = 10, P = 0.95). 

To compare the precision of m with that of *, we determine the asymptotic 
(n — ) variance var m of m. The variance is given by (see Cochran, 1953, 


p. 128) 
var m = (am? — 2ym + 8)/Ei , (30) 
which reduces to 


2n7(m + a)*(m + by? 
nb — a)? on 


var m = 
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if we substitute o, = AS; (¢ = 1, 2) and take into account that by (4) 
S,: S, = (m + b) : (m + a). Hence, for sufficiently large n, 


var m 


wae an’, (32) 


which in most practical cases is close to one. 


5. ILLUSTRATION BY MEANS OF AN ARTIFICIAL EXAMPLE 


Consider the hypothetical case where the mixture contains m = 30 units of 
radioactive compound C and that this amount is to be determined by double 
dilution with a = 100 and b = 1000 units of inactive compound C. Equation (3) 
then gives the relation 13S, = 1035, , so that the correct value of m would be 
obtained if the specific activities in the experiment were proportional to 103 
and 13 respectively. 

To simulate the experiment, we select two sets of numbers from a table of 
random normal numbers and adjust them so that their population means and 
standard deviations are S, = 103, S. = 13, ando, = AS, , o: = AS. respectively. 
For the constant coefficient of variation, we choose \ = 0.04, so that o, = 4.12 
and o, = 0.52. The “measurements” of specific activity obtained were 


Si; : 100.73 101.76 111.19 104.73 101.85 106.02 95.71 101.73 93.69 105.99 
S.,;: 12.55 14.20 12.48 13.00 12.54 12.15 12.80 11.88 13.22 13.05 


From these we obtain 


> Si; = 1023.40, >> S.; = 127.87, >> Si, = 104967.9756, 
> Si; = 1638.8123,  R, = 229.35,  R, = 28.66, 


and hence * = 28.52 and m = 28.51 (true value m = 30). Again, by (20), 
6} = 25.9133, 63 = 0.4154 and hence ¢, = 5.09, ¢. = 0.64, X = 0.049 (true 
values ¢, = 4.12, o, = 0.52, \ = 0.04). The true value of var mt can be calcu- 
lated from (17) and its estimate s° is obtained by substitution of m and X in 
(17); we find -Y var * = 2.66 and s = 3.24. 

Again, to find confidence limits when m is estimated by m, we find by (23) 
and (24) @ = 2.63, 8 = 67453, 7 = —300.7 (true values a = 1.72, 8 = 44014 
y = —196.8). For 95% confidence limits, Student’s tables give ¢ = 2.26, and 
the inequality (28) reduces to 8011m? — 459268m + 6293003 < 0. The roots of 
this expression are m, = 34.7, m, = 22.7, so that the statement 22.7 < m < 34.7 
can be made with 95% confidence. 
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APPENDIX 


Multiple Dilution with Active Isotopes. 


Suppose that the compound C in a mixture cannot be obtained in radio- 
active form. Any dilution must then be made with active compound. Let m be 
the unknown amount of inactive compound C present in an aliquot Q, of the 
mixture. When a small amount x of radioactive compound C of known specific 
activity Sp is added, the total radioactivity is 


Sox = s(m + 2), 


where s is the specific activity of the compound C after dilution. 

If the quantity m is large enough so that at least a portion of it can be re- 
covered in a pure form, s can be measured and m can be calculated. This is the 
simplest case of quantitative chemical analysis by isotopic dilution. Its ad- 
vantage over ordinary chemical analysis is that the substance need not be 
recovered in toto. The method has been frequently applied for more then twenty 
years; a historical survey of it has been given by Pinajian, Christian, and Wright 
(1953). But if the quantity m is so small (compared with the other substances 
in the mixture) that pure recovery of any portion is impossible, the above 
method is inapplicable because a very large quantity of active compound C 
would have to be added, making s practically equal to S, . 

Now, it sometimes happens that the compound C can be extracted in the 
form of another compound C’ in which it is contained in a fixed but unknown 
proportion P. The specific activity s of the compound C cannot then be de- 
termined, but we can measure the specific activity S of the new compound C’. 
Since the two specific activities are related by the equation S = Ps, we obtain 
the equation 


S(m + 2) = SoxP. 


This equation contains two unknown quantities, m and P. At least two separate 
analyses are therefore required to obtain the necessary equations. One possible 
method of obtaining these is to take two equal aliquots Q, , Q. and add a quantity 
z, of known specific activity S,, to Q, and a quantity 2, of specific activity 
Soo to Q, . If x, and zx, are chosen such that So:2, = Soot, , we have the equations 


S(m+2)=k (¢=1,2) 
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where S, and S, are the specific activities of the compound C’ recovered from 
Q, and Q, respectively. The unknown constant k = S),a,P (¢ = 1, 2) can be 
eliminated, and we obtain 


= Sete — Sidi 
- Si io Se : 


which is of the same form as equation (3). 

It is now clear that the theory developed for the case of multiple dilution 
with inactive isotopes can also be applied to multiple dilution with active isotopes. 
Hence, again, the most accurate value of m is likely to be obtained when half 
the observations are taken with dilutions at the lowest possible level and the 
other half at the highest level. 

There have been as yet only a few applications of multiple dilution with 
active isotopes. One, determining antimony impurities in lead by Zimakov 
and Rozhavski (1958) has been reported by Alimarin and Bilimovitch (1960). 


m 
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The purpose of this paper is to discuss some sequential multivariate techniques 
used in testing means both for the one- and two-sample cases. When the population 
covariance matrix is known, a sequential x?-test is employed. When the population 
covariance matrix is not known, this is replaced by a sequential T*test. Some pro- 
perties of these tests are discussed and certain other procedures are suggested for use 
in testing the assumption that the covariance matrix is known. These sequential pro- 
cedures are then illustrated in a problem involving the acceptance sampling of ballistic 
missiles. 


1. INTRODUCTION 


The motivation for this article stems from the acceptance sampling programs 
used in the evaluation of production lots of ballistic missiles such as the Honest 
John and the Nike. These particular missiles are operational and have been on a 
production basis for some time. They are currently produced in lots and sub- 
jected to the ordinary acceptance sampling schemes used in quality control. 
These sampling systems will be discussed in more detail in Section 3 but it will 
suffice now to say that the manufacture and testing of these missiles is very 
expensive. Consequently, judgments on lots should be made with as little 
inspection as possible consistent with the prescribed risks of accepting poor lots 
and rejecting good ones. This suggests sequential sampling which has come into 
widespread use in the past few years and, in fact, some types of missiles are 
now inspected in that manner. 

There are several important parameters to be inspected on each round in 
missile production; a few such characteristics are action time, thrust or impulse, 
and some measure of chamber pressure. These variables are interrelated and 
hence the problem is a multivariate one. In present day operations, separate 
sequential plans must be set up for each parameter. It is, therefore, possible 
to get conflicting answers about the quality of a lot, sampling may terminate 
for one characteristic before another and there is no appreciation of the true 
sampling risks involved in the overall program. It is obvious that a sequential 
multivariate technique should be used. Although the motivation for this problem 
was in the field of ballistic missiles, the same problem exists any place where 


? Partially sponsored by the Statistics Branch, Office of Naval Research, U. S. Navy. Re- 
production in whole or in part is permitted for any purpose of the United States Government. 

? Now at the Eastman Kodak Company, Rochester, New York. 

* Now at the Florida State University, Tallahassee, Florida. 
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sequential sampling is required and more than one parameter is under con- 
sideration. 


In this article we will give some multivariate sequential inspection schemes 
for the characteristic averages both for the case where the population covariance 
matrix = is known or assumed known (a typical quality control situation) and 
where it must be estimated from the sample. It will be assumed that the variables 
under consideration have a multivariate normal distribution. When the co- 
variance matrix is known, we develop a sequential x”-test; when the covariance 
matrix is estimated from the sample, we develop a sequential T?-test. The 
mathematical justification for these procedures is given in [7]. 


2. SEQUENTIAL UNIVARIATE AND MULTIVARIATE 
PROCEDURES FOR TESTING MEANS 


Statement of the hypotheses 


In univariate situations, test procedures have been constructed to test the 
null hypothesis 


Ho:p=uo (or wu — uo = 4) 
against the alternative 
H,:u Aw (or wr uo # 4). 


When these procedures are extended to the sequential case, it is customary 
to replace these with more specific hypotheses, viz: 


Ho ih = Mo (or w— uo = 4p) 
Hitp= mh (or  — po = 4,), for a one-sided alternative 
and 


H, : (u — wo) = +64, for a two-sided alternative. 


For the case where the population variance is known, the sequential procedures 
have been worked out by Wald [14] and for the case where the variance is not 
known, by Wald, Rushton [11], [12] and others. 

In the multivariate case, these expressions could be replaced by p-variate 
vectors. The null hypothesis could be given as 


Hy: u = Uo 


but it becomes very difficult to specify a meaningful single alternative since 
there are, presumably, infinitely many points in p-space that are of equal im- 
portance as alternatives and even a hypothesis of the type 


Hy 24 — to = % 


would be difficult to specify. It is easier to operate with the surfaces of p-di- 
mensional ellipsoids. For instance, the statements uy = w and 


(u — w)=‘(u — wo)’ = 
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are identical but the quadratic form of the latter expression can also be set 
equal to some scalar quantity viz: 


(uw — wo) 2"'(u — wo)’ = 
which represents the entire surface of a p-dimensional ellipsoid while the ex- 
pression uw — wo = 5 would represent only one point. Similarly, the alternative 


hypothesis would be of the same form but equal to a larger scalar value. Our 
hypotheses become 


H, :(u — wo) =~ '(u — wo)’ = AS. (quite often zero) 
Hy :(u — w)=(u — wo)’ = Ms (A > A). 
Test procedures 


Although the forms of the sequential procedures differ for the cases & known 
and = unknown, they are quite similar in administration. All sequential pro- 
cedures, in the Wald sense, employ a sequential probability ratio p,./Po, which 
is evaluated after each new observation is taken. Let a and 6 denote the usual 
Type I and Type II errors. If, after n observations have been made, Pin/Don < 
8/(1 — a), we accept Ho ; if Pin/Pon = (1 — 8)/a, we accept H, ; if B/(1 — a) < 
Din/Pon < (1 — 8)/a, we infer that we do not have enough information and 
proceed to take another observation and repeat the entire procedure. 


If the population covariance matrix is known, we have the sequential x’- 
test with 


Pin/Pon a to of (p/2; mdix2/4)/oF 1 (p/2; md3oxn/4) ; 


where & is a p-element vector of sample means based on n observations, 
x, = n(X — w)=E (KX — wo)’ and oF,(c; x) represents the generalized hyper- 
geometric function: 


x? 

de> Wai * der Der Bai t 

which has been tabulated in [6]. If Af = 0, the probability ratio reduces to 
Pin/Don = @™*'” oF y(p/2, ndxn/4). 

If p = 1, this further reduces to: 


oFi(c; x) = 1 +: er ae ae 


Din/Pon er oF(1/2, ndix,/4). 
With a little algebra, it can be shown that this is equal to: 


Pin/Pon =e"? cosh ls 2d (x; = w/e . 


the form given by Wald. 
If the population covariance matrix is not known and must be estimated 
from the same sample as the sample means, we have the sequential T?-test with 


Pie = exp [—n(0? — 03/2] iF, [n/2, p/2; nT, /2n — 1 + Tr)) 


Pon iF, [n/2, p/2; ndoT./2(n — 1 + T2)] 
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where S represents the sample covariance matrix based on n observations, 
T? = n(X — w)S ‘( — wo) and ,F,(a, c; x) represents another generalized 
hypergeometric function 


Sia ax , aa + 1)x* , a+ Ia+2)° , 
Fi(a,e5z) = 1+ + e(c + 1)2! - c(e + 1)(e + 2)3! + 


which is more familiarly known as the confluent hypergeometric function and 
has been tabulated, in particular, in [8] and [13]. If 45 = 0, the probability 
ratio reduces to: 


Pin/Pox = €™*'”? F,[n/2, p/2; niT7/2(n — 1 + T?)). 
If p = 1, this further reduces to: 


Pin/Pon = €™*”? ,F,[n/2, 1/2; ndste/2(n — 1 + fh] 


the result given by Rushton. 

It is shown in [7] that both of these procedures terminate with probability 
unity and furthermore that all of the requirements of Cox’s Theorem [2] for 
sequential tests of composite hypotheses are fulfilled so that the risks of in- 
correctly accepting H, and H, are approximately equal to a and 8 respectively. 

It is quite apparent that the evaluation of p,,/Po, after each observation can 
become quite time consuming. The sequential 7-test involves a matrix in- 
version after each observation but with modern computing facilities, this is 
not the problem it once was. Both tests require either the evaluation of hyper- 
geometric series which converge slowly or a table look-up which involves, in 
most cases, non-linear interpolation. The probability ratios involve a, 8, n, p, 
2 , A? , and either x’ or 7”. For a given situation, everything is known in advance 
except x’ or 7” and in many instances, it might be preferable for each value of 
n to use the Newton-Raphson method to solve these equations for the values 
of x’ or 7’, required to reach a decision. These values may then be displayed 
in tabular form. 

Let the solutions for x’ and 7” in the equations 


Pin/Pon = B/(L —a@) be x, and TJ, respectively, 


and let similar solutions of 


Din/Pon = (1 — B)/a be x2 and TJ? respectively. 


In the application of the sequential x*-test, for a given sample size of n, once 
a, 8, p, Xo and dj have been specified, a sample value of xi < x; will result in the 


acceptance of H, , a sample value of x; > x. will result in the acceptance of 
H, and any time that x; < x; < x; another observation must be made. A similar 
procedure holds for T’. Tables are given in [4] for these limits as a function 
of n for both the sequential x’-test and sequential 7?-test for a = 8 = .05; 
p = 2,3, --- ,9;3 = 0, several appropriate values of \? and suitable ranges of 
values for n consistent with the sample sizes required, on the average, to reach 
a decision. 
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3. APPLICATION TO BALLISTIC MISSILE TESTING 


Ballistic missile testing can be carried on in two ways: (1) flight or operational 
testing where the missile is actually put into flight, and (2) static testing where 
the propulsion unit is fastened down to prevent its flight during burning and 
measures of characteristics such as thrust and pressure are obtained by placing 
strain gages on the nose of the missile. This latter type of testing is used in the 
acceptance sampling of such missiles as the Honest John and the Nike. Our 
problem specifically deals with the inspection of the solid propellant boosters 
for the Nike-Ajax and the Nike-Hercules. ; 

Three characteristics will be considered: 


1. Action time. This is a measure of the total elapsed time during which 
the propellant in the booster is burning (neglecting after-burning). 

2. Total impulse. This is a measure of the total force exerted by the 
booster during, burning. 

3. Maximum pressure. This is a measure of the maximum pressure develop- 
ment inside the chamber of the booster during burning. 


The boosters under consideration are manufactured in regular production 
lots which are passed or rejected on the basis of sampling inspection. Current 
inspection procedures required nine rounds to be tested in order to pass judgment 
on the quality of the lot. In order to illustrate both the sequential x’- and T?-tests, 
some liberties will be taken with the assumptions related to the inspection 
program for these missiles. We will consider two cases: Case I will consist of an 
example where only the standards and tolerances are given and only scant 
information is available concerning the actual variability of the production 
process in terms of action time, total impulse, and maximum pressure. Case IT 
will assume that sufficient sampling inspection experience has been obtained 
in order for the population covariance matrix for these three variables to be 
known. The other liberty we shall take is in the establishment of upper and 
lower tolerances for all variables when, in fact, only one-sided tolerances exist 
in some instances. Our procedures at present will accommodate only two- 
sided tolerances. 

It should be emphasized that although the data and, to some extent, the 
tolerances used in this chapter are from an actual production situation, the 
methods employed to establish \{ and \j in no way reflect the actual acceptance 
sampling policy employed in the Nike program. 

Our variables will be coded as follows: 

= (Action time — 3.000) seconds X 10°. 


(Total impulse — 143,000) pounds-seconds X 107°. 
z,; = (Maximum pressure — 1,000) PSI. 


In terms of these coded variables, our standards and tolerances for individual 
rounds are defined as follows: 


Action time: z, = 100 + 120, 
Total impulse: z, = 200 + 300, and 
Maximum pressure: z; = 50 + 200. 
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Case I: & unknown. 


In this first example, it will be assumed that a sequential sampling plan is 
required which will gurantee with risk 8 = .05 that no lot shall be accepted 
which contains more than 2.5% defective rounds and at the same time gurantee 
with risk a = .05 that no lot shall be rejected when the true means of all three 
characteristics are on standard. It will further be assumed that only a meager 
amount of information is available concerning the variability of these character- 
istics. From this information, it is inferred that the individual tolerances con- 
stitute limits of +30 for individual observations about their standards and 
that there is no evidence that the variables are correlated. (The assumption of 
independence regarding tolerances is not always valid. In some types of missiles, 
total impulse and action time must be negatively correlated to insure a fixed 
range for their flight. This will affect )”.) 

Considering each characteristic separately, the requirement that 97.5% of 
the lot must be within tolerances implies that the true lot mean for that character- 
istic cannot be closer than 2.240 to either tolerance limit; conversely, the true 
mean must be within .76¢ of the standard since the tolerances were assumed to 
be +3c¢ limits. Therefore, the lot means for each characteristic should be in the 
intervals: 


x: 100+ 30.4, 

t2: 200 + 76.0, 
and 

Zz: 50+ 50.7. 


Several possibilities exist. One possibly would involve inscribing an ellipsoid 
inside the rectangular solid bounded by yw; + .760,; , 4 = 1, 2, 3 variables. This 
would be rather restrictive and would be used in the case where the specifications 
were to be strictly employed. The non-centrality parameter under H, for a 
given characteristic would be (.76)’ and since the occurance of an out-of-specifi- 
cation condition for any one of the variables is sufficient reason to reject the 
lot, \{ = .5776 where now 


d? = [u, — 100u2 — 200u; — 50], 1600 0 O Iu 100 
0 10000 0 | uw. — 200): 
0 O 4489 |Lus — 50 | 


Considering the crudeness of the determination of )’ in the first place, a value of 
\? = .5 is probably quite adequate. Our hypotheses have been restated as: 


as ’ =0 


Hi: N= .5 


Since the covariance matrix is unknown, we now employ the sequential 
T’-test using the tabular entries in Table I. Acceptance of a lot is not per- 
missible until at least twelve rounds have been fired, and rejection cannot 
occur until at least eleven rounds have been fired. The individual coded obser- 
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Tasie I 
Data, T2 and sequential limits for T? based on the test for 3 = 0,4? = .5,a = 6 = .05 


Observations 
Round 


Number Ze 


1 
2 
3 
4 
5 
6 
7 
8 
9 


vations and the resultant values of JT? are shown in Table I. From either Table I 
or Figure 1, where the successive values of T% are plotted, it can be seen that. 
the sequential process has terminated on the firing of the 23rd round with the 
rejection of the null hypothesis that the lot is of acceptable quality. This would 
appear to be a fairly typical sample since the Average Sample Number when H, 
is true is approximately 24, (27 if H, is true) and represents a substantial re- 
duction from the 37 rounds required for the fixed-sample-size plan having the 
same a, 8, \j and Aj. 

The fact that so many more rounds were required than were used in the 
current testing procedure indicates that this procedure is considerably more 
exacting than the current fixed-sample-size procedure of nine rounds, or con- 
versely, that the value of 7 implied by the fixed-sample size procedure is con- 
siderably larger than .5. 

This is only one method of determining 7 for this situation and consists 
essentially of inscribing an ellipsoid inside the rectangular solid bounded by the 
specifications. A second method would be to circumscribe an ellipsoid around 
this rectangular solid. This is more conservative than the other method and 
would result in more material being accepted. It can be used when the lot is 
considered passable even when all of the characteristics are border line. This 
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SEQUENTIAL T? TEST 
AZ =O 
nt =5 
<< =8=05 


12 13 1415 16 17 18 19 20 22 23:24 
ROUNDS TESTED 


Fiaure 1 


second method would yield a value of \? = 3(.76)? = 1.73. Using a value of 
i = 2.0 with the same data shown in Table I, the null hypothesis would be 
accepted after nine rounds, by coincidence, the amount required by the present 
fixed sample procedure. This adequately demonstrates the problems associated 
with specifying the hypotheses in multivariate problems. (The average sample 
numbers required for this plan, incidently, are approximately 6 if H, is true, 
10 if H, is true and 13 for the corresponding fixed-sample-size plan.) 

Another alternative that may be used when nothing is known about the 
variability of the characteristics is an extension of the variables inspection 
technique [15]. In this procedure, two values of proportion defectives are stated. 
One of these, p, , represents an acceptable lot and the other, p, , represents an 
unacceptable lot. The hypotheses are: 


_e. P =P: 
HA,: p=p>p,. 


In the univariate case, once the tolerances are given and 7, , pz , a, and 8 specified, 
a sequential scheme can be set up utilizing the mean and standard deviation 
of the sample. No similar technique is yet available in the multivariate case for 
specifying \”. This form of the test would require non-zero values of Xj . 

Since the present non-sequential procedure involves testing nine rounds, 
another alternative could be to use the confidence limits for the mean of nine. 
rounds in determining \”. In the present situation, the univariate 95% confidence 
limits for averages of nine is .77s and for three independent variables an estimate 
of \” would be 3(.77)? = 1.78. This would imply that the confidence limits 
approach would suggest a fixed-sample test of a = 8 = .05, \3 = Oand dj = 1.78 
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which is quite closed to that required by circumscribing an ellipsoid about the 
specifications used in this example. 


We see, then, that there are many ways of specifying \” under H, and the 
choice of methods will depend on the individual situation. 


Case II: & known. 


We will now assume that in the time which has elapsed since the rejection 
of the lot discussed in Case I sufficient information has been gathered so that 
the population covariance matrix can be assumed to be known. From the data 
of approximately 100 rounds, the covariance matrix was estimated to be: 


870 -—400 -—200 
= =| -—400 7075 1535 
—200 1535 1300 
The hypotheses are now: 


H;: (wu — w)= (ue — wo) = x +=0,1. 


The standard deviations for these characteristics are o, = 29.5, o. = 84.1, 
and o; = 36.1. It appears that the tolerances imposed on the process are greater 
than the +30 as assumed in Case I. This suggests several possibilities. One 
possibility would be to use the natural tolerances of the process and let \j remain 
at .5. This procedure is often employed when the acceptance sampling program 
is also used to control the process, but in the case of these Nike boosters, too 


much material of acceptable quality would be rejected and considering the cost 
factor this would not be a recommended procedure. 

A second possibility would be to inscribe an ellipsoid based on the covariance 
matrix within the rectangular solid implied by the specifications and solve for 
\} . By the method given in the appendix, the largest ellipsoid that can be so 
inscribed corresponds to a d? of .81. Using \3 = 1 and employing a sequential 
x'-test of the data shown in Table II, the sequential procedure was terminated 
with the rejection of the lot after four rounds had been tested (See Figure 2a). 


Taste II 
Data, x3,, x}, and sequential limits for these quantities based on the tests: 
(1) 4%, = 0,4, =1la=6 = .05 
(2) 43, = 3,3, = 6,a = 6B = .05 


Observations 
Round 


Number X1 Ze Zs 


151 272 70 4.670 - 34.99 

159 215 46 8.498 21.20 443 
178 157 48 14.834 - 16.79 1.503 
187 152 74 24.243 - 14.73 3.468 
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(a) (b) 
SEQUENTIAL Xi, TEST SEQUENTIAL X3 -TEST 
Ax,"O A= 3 
Aq,= 1 AZ, = 6 
oc =£-.05 X= £=.05 


123456789 23486789 
ROUNDS TESTED ROUNDOS TESTED 


FIGure 2 


(ASN under H, : 13; under H, : 9; Fixed-sample-size plan: 18.) This fairly quick 
decision was the result mainly of the longer action times (z;). 

Inscribing ellipsoids is, of course, the most discriminating method as we have 
already seen in Case I. Circumscribing ellipsoids becomes somewhat more 
unrealistic when dealing with correlated variables because major portions of the 
ellipsoids along the major axis may be considerably out of specification. However, 
one can examine the different ellipsoids that come in contact with the corners 
of the rectangular solid. It is possible that the smallest of these might bound a 
reasonable indifference region. In this particular case, the minimum value of \’ 
corresponding to these ellipsoids is 2.6 and for this particular lot would cut the 
amount of sampling required to three rounds. 

One might ask what the effect is of rounding the values of \” to convenient 
numbers. The 7?-tables in [4] start with increments for \? of .5 and these in- 
crements increase as )” increases. It is felt that in the initial formulation of a 
problem, where a sequential T’-test might be employed, statements of hypotheses 
are fairly general anyway. On the other hand, problems such as some repetitive 
quality control operations generally would require a sequential x’-test. These 
tables will handle a greater variety of values of dj but if the problem requires a 
fairly precise value of )’, it would probably also justify the computation of the 
corresponding limits. 


4, GENERALIZED x’-STATISTICS 


An additional technique which may be employed for Case II, the situation 
where the covariance matrix is known, involves the use of generalized x -sta- 
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tistics [5]. This allows us to compare not only the sample mean of a lot with the 
standard, but S, the covariance matrix of the sample, with =, , the covariance 
matrix that has been established over the base period. It is, of course, possible 
for the mean of a lot to be close to the standard but for the variability of indi- 
vidual items to be so excessive as to impair the over-all quality of the lot. This 
technique involves the three different x*-statistics illustrated in Figure 3 which 
contains, in two-space, the standard, three observations, and their mean. 


GENERALIZED X?- STATISTICS 
—_— <= 
— Xi, 
+--+ Xs 


STANDARO 


FIGURE 3 


x5 represents the total generalized distance of all the individual observations 
from the standard (i.e. a measure of the total variability of the sample about 
the standard). If 


xi = (Ki — wo) Zo (Ki — wo)’ 9 t=1,+++,n 
then 
xo = dX Xi 

A useful property of this statistic is that it can be broken up into two other 
x'-statisties representing the level and dispersion of the sample—a sort of 
multivariate analogue of the average and range used in control charts or the 
treatment and residual sums of squares in the analysis of variance. In this section, 
the component related to the level of the sample will be denoted by 


Xe = 2X — wo) Zo (X — wo)’ 
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the same quantity we have been dealing with throughout the previous section. — 
xix is a measure of the distance from the sample mean to the standard.The other 
component 


xb = (n — 1) TrS= >" (where Tr A = >; a,,) 


is a measure of the variability of the sample observations about their own mean 
(Ap = Tr XX>'). However, since x = xix + xp, then x5 can be obtained by 
substraction and S need not be computed at all. This statistic can be used to 
test the hypothesis 


Ho: & =X or Ap = Ap, = P (usually) 
against the alternative 
Hi: &£>X% or M=A,>P 
for the sequential case. Since x%, is used to test the hypothesis 
Ho: Nw = Xe. 


against the alternative 


H, ° i = Ae. 
it may then be implied that x} may be used to test the hypothesis 
Hy: = 2%, = Xb. +2=Dys,, 


against the alternative 
H,: B=, =m, + S— Dy, 


The same sort of reasoning used to derive the sequential probability ratio for 
xix can be used to obtain them for x3 and x3 respectively: 


i 2 42) 79) oF il(n — 1)p/2; (n — 1)d,xd/4) 
Pra/Pon = exp [—(0 — 1)(X3, — 03,)/2] Se = Bile — Dag 


and 


sa 2 2 oF [np/2; ndo,x0/4] 
Pin/Pon = exp [—N(Xo, — Ao.)/2] oe. 


These test procedures may be carried out in parallel with the one discussed 
under Case II in the preceding section. It will be recalled that in that case 
xs. = O and X,, = 1. For the sequential case it is customary to set A>, = ? 
since if £ = £,, ZX,’ = Land TrI = p. In the present example A}, = 3 and it 
will be assumed that we wish to run a risk of only 5% of failing to detect a general 
increase of 100% in the variability of the product; hence, \3,, = 6. The results of 
this sequential test using the same data as used to illustrate Case II are shown 
in Table II and Figure 2b. Hy is accepted after four rounds; hence it can be 
concluded that the variability of the lot has not changed even though the level of 
the lot has shifted. 
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The over-all test, x3 , is of limited use being less sensitive than the other two 
and consequently would require more sampling to reach a decision. If H,was 
accepted for the xj-test, one would have to go back to the results of the x3,- and 
x3-tests anyhow and these, presumably, would have already terminated. 


5. ADDITIONAL COMMENTS 


A number of additional comments should be made in connection with these 
procedures: 


a. Two-sample procedures. 


Suppose one wished to sequentially test the hypothesis that two samples came 
from populations having the same mean (i.e. Ho : yi: = wz) against the alternative 
H, : ui * we . These can usually be considered paired samples in the sequential 
case since they are obtained in that manner. Let the covariance matrices of 
these two populations be £,, and X22 respectively and the cross-covariance 
matrix Xj. . Then the covariance matrix of the difference between vectors of 
observations taken from each population is 2, + Zs — Zi. — Ef, [1]. If By, 
X.. and Z,. are known, the two-sample problem is reduced to an ordinary 
sequential x*-test. If the covariance matrices are not known, ©, + Ys. — 
x,. — ={, must be estimated directly from the differences between the observa- 
tion vectors and an ordinary sequential T’-test will result. 


b. Tables. 


With the wide variety of combinations of n, p, \5 , Ai , a and B required for 
these sequential tests, it would be quite a task to prepare tables to facilitate 
these procedures that would be even as complete as those complied by the 
National Bureau of Standards for the sequential ¢-test [9]. The aforementioned 
tables of Freund and Jackson [4] supply a wide variety of combinations of n 
and \j for the sequential x’-test when \?} = 0 fora = B = .05; p = 2,3, --- , 9. 
Similar tables are included, though for a more select range of n and j , for the 
sequential T*-test. No formal tables have been prepared for the sequential 
x; and x tests or for the x3, test when A} ¥ 0. 


c. Average sample numbers. 


All of these multivariate procedures are tests of composite hypotheses and 
hence the approximations for average sample numbers given by Wald cannot 
be used. A very crude approximation due to Bhate [10] has been used to estimate 
the ASN’s for all of these procedures and is given in [7]. A Monte Carlo study 
by Freund and Appleby [3] has indicated for a limited number of cases that 
these approximations are reasonably close. 


d. Computational requirements. 


Admittedly, the computations required for the sequential 7?-test are not: 
modest for even a moderate p involving, as it does, a matrix inversion after 
each observation is made. However, in ballistic missile testing, the cost of 
testing far outweighs the computational cost, so the sequential method is 
still justified. For a low-cost high-volume situation, sequential sampling by 
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groups might be appropriate. The sequential x’-test requires considerably less 
computation and, in general, should not be much of a problem. 


e. Other fields for future development. 


Much work needs to be done in the field of sequential multivariate tests. 
The k-sample multivariate problem has received practically no attention at all. 
A study of truncated or restricted schemes would prove quite valuable since 
most actual problems must be terminated after some reasonable time if a de- 
cision has not been reached. 

Work could also profitably be done on multiple sampling (sequential sampling 
by groups) to overcome some of the scheduling and computing problems associ- 
ated with sequential sampling of a high-volume, low-cost product. Until such 
time as this can be done, Wald’s grouping method (i.e. using x, and x; or 7? 
and 7. with n corresponding to the total sample size at the end of each sample) 
could be used. Its disadvantage of the ASN values being higher than necessary 
is partially compensated by the fact that a and 8 are overstated to some extent. 
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APPENDIX 


A Method of Inscribing an Ellipsoid Within a Rectangular Solid 
Consider the three-dimensional rectangular solid defined by the tolerances 
imposed on the Nike Booster. Let y, = xz, — 100, y = wt. — 200 and 
Ys = 23 — 50. Since any ellipsoid of interest will be symmetric about the center 
of this solid, we need be concerned with only three sides of the solid. Let us 
first consider the side y; = 50.7, the upper tolerance for maximum pressure. 


We wish to find out the y, and y,-coordinates for an ellipsoid which will just 
touch this side. We do this by setting 


= [y, yo 50.7]E7 ly: yo 50.7)’ 
= Ayi + Byys + Cy: + Dy: + Ey. + F. 


This represents a family of ellipses resulting from the intersection of the plane 
y; = 50.7 with all possible ellipsoids represented by the quadratic form y=" 'y’. 
We are interested in the case where this ellipse becomes a single point which 
can be obtained by solving the system of equations: 


an’ 
OY; 
an’ 
OY2 


for y, and y2 . These can then be substituted back in the original expression for 
\”. This procedure can then be repeated for the surfaces y, = 30.4 and y, = 76.0. 
The minimum value of \’ obtained from these three determinations will define 
the smallest ellipsoid that can be inscribed in the rectangular solid. 

Example: In the present example (Case II, Section 3) with 


001206 000032 000166 
=" = | .000032 .000191 —.000220 
.000166 —.000220 001057 


y=[y ye 50.7] 
* = .001206yi + .000064y,y2 + .000191y3 + .016832y, 


— .022308y. + 2.717013. 
From this we obtain 


— .002412y, — .000064y, =  .016832 
— .000064y, — .000382y, = —.022308 
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with the solution y, = 59.8331, y, = —8.5661 and \” = 1.98. When the side 
y, = 30.4 is studied, \7 = 1.09 and with the side y, = 76.0, \* = .81 which 
represents the smallest ellipsoid of the form y=~*y’ which can be inscribed within 
the tolerances. 


This method can easily be generalized to handle any number of dimensions. 
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Tables for Maximum Likelihood Estimates: Singly 
Truncated and Singly Censored Samples* 


A. Currrorp CoHEN, Jr. 
The University of Georgia 


In a previous paper in Technometrics, Vol. 1, 1959, the author derived the maxi- 
mum liklihood estimates of the mean and variance for simply truncated or simply 
censored samples drawn from a Normal distribution. This paper extends considerably 
the tables originally published, and contains a further worked example. 


Maximum likelihood estimators presented in the August 1959 issue of this 
journal [1] for the mean and variance of a normal distribution when samples are 
singly truncated or singly censored, involved only one auxiliary estimating 
function with each of these sample types. Estimates as well as their asymptotic 
variances are relatively easy to calculate when the necessary tables are available, 
but unfortunately the tables originally provided failed to prove adequate in all 
cases. The present paper constitutes a response to numerous requests for a 
more complete tabulation of the pertinent functions. 

Our concern is with singly truncated samples and with singly censored samples 
of both types I and II when the random variable is normal (u, c). For all samples 
under consideration, N designates the total number of sample specimens, and n 
the number whose measurements are known. These three sample types are 
more completely described as follows: 

Singly Truncated Samples. In samples of this type, a terminus 2p is specified. 
Observation is possible only if x > 2» , in which case truncation is said to be 
on the left, or if x < 2» , in which case truncation is said to be on the right. In 
this case, measurements are known for all sample specimens and hence N = n. 
In certain applications it might be preferable to consider that the restriction 
(i.e. truncation) is imposed on the distribution rather then on the sample being 
observed. The adoption of this latter point of view involves no change in the 
estimators. 

Type I Singly Censored Samples. As in the singly truncated samples, a termi- 
nus 2» is specified, but in this case sample specimens whose measurements fall 
in the restricted interval of the random variable may be identified and thus 
counted, though not otherwise measured. When the restricted (censored) interval 
consists of all values x < 2» , censoring is said to occur on the left. When the 
censored interval consists of all values + > 2» , censoring is said to be on the 
right. The remaining specimens for which x > 2» or (x < 2) are fully measured 
without restriction. Samples of this type thus consist of N observations of which 


n are fully measured and N — n are censored with N being fixed and n a random 
variable. 


* Sponsored by the Office of Ordnance Research, U.S. Army. 
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Type II Singly Censored Samples. In samples of this type, full measurement 
is made only for the n largest observations in which case censoring is on the left 
or for the n smallest observations in which case censoring is on the right. Of the 
remaining N — n censored observations, it is known only that x < z, or (x > z,), 
where z, is the smallest (or largest) fully measured observation. In samples of 
this type both N and n are fixed, but x, is a random variable. 

For the convenience of readers who might not have a copy of reference [1] 
available, the estimators obtained there are repeated below without derivation. 
The caret (~) serves to distinguish maximum likelihood estimators or estimates 
from the parameters being estimated. 


Estimators for Singly Truncated Samples 
A=f— WH — x), 
= + H% — x)’. 
Estimators for Type I Singly Censored Samples 
A= t—NME—m), 
& =s' + KE — 2,)’. 
Estimators for Type II Singly Censored Samples 
A= — XE —2,), 
= s' + ME — z,). 


In case of the above cases, = and s” are the mean and variance respectively of 
the n measured sample observations. 


~= . z,/n, 


(3) 


2 =\2 
s = Dz &)"/n. (4) 
The auxiliary estimating functions 6 and \ were defined in [1] in connection 
with derivations of the above estimators. They are presented here in tables 1 
and 2 as functions of y and of y and h respectively where y = [1—Z(Z—£)]/(Z—£)’ 
in the case of truncated samples, and y = [1 — Y(Y — £]/(Y — &) in the 
case of censored samples. As defined in [1] 


Z = 9@)/[1 — F@], and Y = [h/(1 — h)le®/F@), 


where F(£) = f*.. o(t) dt, o(t) = (~/2n)~ exp — #°/2, and where ¢ = (x) — u)/o 
in truncated and type I censored samples, while § = (x, — u)/o in type II 
censored samples. In both type I and type II censored samples, h is the pro- 
portion of censored observations; i.e. h = (N — n)/N. ° 

In Table 1, which applies to truncated samples, 6(y) is given at equal intervals 
of 0.001 for the argument 7, whereas in the original table, these intervals were 
unequal and somewhat wider. For any given truncated sample, after computing 
4 = s"/( — 2)", enter table 1 with y = 7 and interpolate as necessary to obtain 
6 = 0(4). Ordinarily, linear interpolation will be adequate. With 6 thus de- 
termined, the required estimates follow from (1). 
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Table 1. AUXILIARY ESTIMATION FUNCTION © 
For Singly Truncated Samples 





@Srrougqhs 
mD wom wo 
S283 Saks 


-004 


-00001 


00004 
-00013 
-00031 
-00063 
-00112 


-00184 
-00283 
00410 
-00571 
.00769 


-01006 
01286 
01611 
01986 
02413 


02895 
03435 
04036 
-04704 
-05439 


-06247 
07131 
-08095 
-09144 
- 10281 


+1151 
- 1284 
-1428 
- 1582 
1748 


- 1926 
+2117 
+2321 
+2540 
+2774 


+3023 
+3290 
-3575 
3878 
+4202 


+4547 
4915 
+5307 
-5725 
-6170 


-6645 
+7150 
7689 
-8263 
-8876 


-9530 
1.023 
1.097 
1.177 
1.262 


1.353 
1.451 
1.556 
1,668 
1.789 


1.919 
2.059 
2.210 
2.373 
2,549 


2.740 
2.948 
3.173 
3.420 
3.690 


3.986 
4.311 
4.67 
5.07 
5.51 


6.01 
6.56 
7.19 
7.91 
8.73 


005 


-00001 


-00004 
-00014 
-00033 
-00067 
-00118 


-00193 
00294 
-00425 
-00589 
-00791 


-01032 
-01316 
-01646 
-02026 
02458 


02946 
-03492 
-04100 
04774 
-05516 


-06332 
-07224 
-08196 
-09254 
- 10400 


-1164 
- 1298 
-1443 
.1598 
1765 


-1944 
+2136 
+2342 
+2562 
-2798 


3049 
-3318 
3604 
-3910 
-4236 


.4583 
-4953 
5348 
.5768 
-6216 


-6694 
+7202 
7745 
-8323 
-8940 


-9598 
1.030 
1.105 
1.185 
1.271 


1.363 
1.461 
1.567 
1.680 
1.802 


1.932 
2.073 
2.225 
2.390 
2.567 


2.760 
2.969 
3.197 
3.446 
3.718 


4.017 
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+006 


00001 


00005 
-00016 
-00036 
-00071 
-00125 


-00202 
-00305 
-00440 
-00608 
-00813 


-01058 
01347 
-01682 
-02067 
-02504 


-02998 
-03550 
-04165 
04845 
-05594 


-06418 
-07317 
08298 
-09364 
- 10520 


+1177 
+1312 
-1458 
-1614 
-1782 


-1963 
+2156 
+2364 
+2585 
+2822 


-3075 
3346 
+3634 
3941 
4269 


-4619 
+4992 
- 5389 
-5812 
-6263 


-6743 
-7255 
+7801 
-8383 
-9004 


- 9666 
1.037 
1.113 
1.194 
1.280 


1.373 
1.472 
1,578 
1.692 
1.814 


1.946 
2.088 
2.241 
2.407 
2.586 


2.780 
2.991 
3.221 
3.472 
3.747 


4.048 
4.380 
4.75 
5.15 
5.61 


6.11 
6.68 
7.33 
8.06 
8.91 


-007 


-00001 


-00006 
-00017 
-00039 
-00075 
-00131 


00211 
-00317 
-00455 
-00627 
-00835 


-01085 
-01378 
-01718 
-02108 
02551 


-03050 
-03609 
-04230 
-04917 
-05673 


-06504 
-07412 
-08401 
-09476 
- 10641 


-1190 
+1326 
-1473 
- 1630 
+1800 


- 1982 
+2176 
+2385 
+2608 
+2847 


-3102 
3374 
- 3664 
-3973 
+4303 


4655 
- 5030 
- 5430 
- 5856 
+6309 


-6793 
+7308 
+7857 
-8443 
-9068 


+9735 
1.045 
1.121 
1.202 
1.289 


1.382 
1,482 
1.589 
1.704 
1,827 


1.960 
2.103 
2.257 
2.424 
2.605 


2.800 
3.013 
3.245 
3.498 
3.776 


4.080 
4.415 
4.79 
5.20 
5.65 


6.17 
6.74 
7.40 
8.14 
9.00 


-008 


-00002 


00007 
-00019 
00042 
00080 
-00138 


-00220 
00330 
00470 
-00646 
-00858 


-01112 
-01410 
-01755 
-02150 
02599 


-03103 
- 03668 
-04296 
-04989 
-05753 


-06591 
.07507 
08504 
-09588 
- 10762 


-1203 
-1340 
- 1488 
- 1647 
+1817 


+2001 
+2197 
+2407 
+2631 
- 2871 


+3128 
+3402 
- 3694 
-4005 
4338 


-4692 
- 5069 
5471 
- 5900 
-6356 


-6843 
+7361 
7914 
8504 
+9133 


+9804 
1.052 
1.129 
1.211 
1,298 


1.392 
1,492 
1.600 
1.716 
1.840 


1.974 
2.118 
2.273 
2.441 
2.623 


2.821 
3.036 
3.270 
3.525 
3.805 


4.112 
4.450 
4.82 
5.24 
5.70 


6.22 
6.81 
7.47 
8.22 
9.09 
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Table 2. AUXILIARY ESTIMATION FUNCTION A(h,Y) 
For Singly Censored Samples 


-010100 .020400 .030902 .041583 .052507 .063627 .074953 .086488 .09824 .11020 .17342 .24268 
-010551 .021294 .032225 .043350 .054670 .066189 .077909 .089834 .10197 .11431 .17935 .25033 
-010950 .022082 .033398 .044902 .056596 .068483 .080568 .092852 .10534 .11804 .18479 .25741 
-011310 .022798 .034466 .046318 .058356 .070586 .083009 .095629 .10845 .12148 .18985 .26405 
-011642 .023459 .035453 .047629 .059990 .072539 .085280 .098216 .11135 .12469 .19460 .27031 


-011952 .024076 .036377 .048858 .061522 .074372 .087413 .10065 .11408 .12772 .19910 .27626 
-012243 .024658 .037249 .050018 .062969 .076106 .089433 .10295 .11667 .13059 .20338 .28193 
-012520 .025211 .038077 .051120 .064345 .077756 .091355 .10515 .11914 .13333 .20747 .28737 
-012784 .025738 .038866 .052173 .065660 .079332 .093193 .10725 .12150 .13595 .21139 .29260 
-013036 .026243 .039624 .053182 .066921 .080845 .094958 .10926 .12377 .13847 .21517 .29765 


-013279 .026728 .040352 .054153 .068135 .082301 .096657 .11121 .12595 .14090 .21882 .30253 
-013513 .027196 .041054 .055089 .069306 .083708 .098298 .11308 .12806 .14325 .22235 .30725 
-013739 .027649 .041733 .055995 .070439 .085068 .099887 .11490 .13011 .14552 .22578 .31184 
-013958 .028087 .042391 .056874 .071538 .086388 .10143 .11666 .13209 .14773  .22910 .31630 
-014171 .028513 .043030 .057726 .072605 .087670 .10292 .11837 .13402 .14987 .23234 .32065 


-014378 .028927 .043652 .058556 .073643 .088917 .10438 .12004 .13590 .15196 .23550 .32489 
-014579 .029330 .044258 .059364 .074655 .090133 .10580 .12167 .13773 .15400 .23858 .32903 
014775 .029723 .044848 .060153 .075642 .091319 .10719 .12325 .13952 .15599 .24158 .33307 
-014967 .030107 .045425 .060923 .076606 .092477 .10854 .12480 .14126 .15793 .24452 .33703 
-015154 .030483 .045989 .061676 .077549 .093611 .10987 .12632 .14297 .15983 .24740 .34091 


-015338 .030850 .046540 .062413 .078471 .094720 .11116 .12780 .14465 





For all values 0< Y<1,.A(O,Y) = 0, 


In Table 2, which applies to censored samples, A(h, y) is given for h = 0.01(0.01) 
0.10(0.05) 0.70(0.10) 0.90 and for y = 0.00(0.05) 1.00. This represents a con- 
siderable enlargement of the original table which was limited to entries for 
which h < 0.50. For any given censored sample, after computing 7 = s*/(@ — 20)” 
or 4 = 8°/(@ — z,)° andh = (N — n)/N, enter table 2 with these values of the 
two arguments to obtain } = A(h, 7) using two-way interpolation. Here again 
linear interpolation should be sufficiently accurate for most requirements. 
With i thus determined, the required estimates follow from (2) or from (3), 
the choice of equations depending on sample type. 

The asymptotic variances and covariances may be calculated as 


2 2 


Va) ~ Fun, Cov (a, é)~ 5 may 


2 
Mie 


VOQ)~Sun, pe ~ B= 
N MiiMe2 : 





TABLES FOR TRUNCATED OR CENSORED SAMPLES 539 


where the u,; above are so defined that the expressions of (5) equal the corre- 
sponding expressions given in [1]. 

In order to permit ready evaluation of the y,; of (5), and thereby simplify 
the calculation of asymptotic variances and covariances, Table 3 has been added. 
(Various less extensive tables giving certain of the entries included in Table 3 
have previously been published by Bliss [3], Gupta [4], Hald [5], and Cohen 
and Woodward [2]. Credit for the Bliss tables relating to censored samples, 
which were the first of these to appear, was inadvertently attributed to W. L. 
Stevens both by Hald [5] and by the writer [1]. It has recently been learned 
that while Stevens derived the formulas involved, computation of the tables 


Table 3. VARIANCE FACTORS FOR SINGLY TRUNCATED AND SINGLY CENSORED SAMPLES 
For Truncated Samples For Censored Samples coke 
Rest. 


1.00054 -.001143 .502287 -.001613]| 1.00000 -.000006 .500030. -.000001 
1.00313 ~-.005922 .510366 -.008277/ 1.00001 -.000052 .500208 -1000074 
1.01460 ~-,024153 .536283 -.032744/ 1.00010 -,000335 °501180 -.000473 


1.05738 -.081051 .602029 ~,101586 | 1.00056 -.001712 .505280 ~-,002407 
1.07437 -.101368 .622786 ~,123924/ 1.00078 -.002312 .506935 -,003247 
1.09604 -.126136 .646862 ~-.149803/ 1.00107 -.003099 .509030 -.004341 
1.12365 -,156229 .674663 -.179434/ 1.00147 -.004121 .511658 -.005757 
1.15880 -,192688 .706637 -.212937/ 1.00200 -,005438 .514926 -.007571 


1.20350 -,236743 .743283 -.250310 | 1.00270 -,007123 .518960 -.009875 
1.26030 -.289860 .785158 -~.291388 | 1.00363 -,.009266 .523899 -,.012778 
1.33246 -.353771 .832880 -.335818 | 1.00485 -.011971 .529899 -.016405 
1.42405 -.430531 .887141 -.383041 | 1.00645 ~-.015368 .537141 -.020901 
1.54024 -~,522564 .948713 -,.432293 | 1.00852 -.019610 .545827 -.026431 


1.68750 -,632733 1.01846 -~.482644 | 1.01120 -,024884 .556186 -.033181 
1.87398 -.764405 1.09734 -.533054 | 1.01467 -,031410 .568471 -.041358 
2.10982 -.921533 1.18642 ~.582464/ 1.01914 -.039460 .582981 -.051193 
2.40764 -1.10874 1.28690 -,.629889 | 1.02488 -.049355 .600046 -.062937 
2.78311 -1.33145 1.40009 ~,.674498 | 1.03224 -.061491 .620049 -.076861 


3.25557 -1.59594 1.52746 -.715676 | 1.04168 -.076345 .643438 -.093252 
3.84879 -1.90952 1.67064 -.753044 | 1.05376 -.094501 .670724 -.112407 
4.59189 -2.28066 1,83140 -~.786452 | 1.06923 -.116674 .702513 -.134620 
5.52036 -2.71911 2.01172 -,815942 | 1.08904 -.143744 .739515 -.160175 
6.67730 -3.23612 2.21376 -.841703 | 1.11442 -.176798 .782574 -.189317 


8.11482 -3,.84458 2.43990 -~,864019 | 1.14696 -,217183 .832691 -.222233 
9.89562 -4.55921 2.69271 -~,883229 | 1.18876 -~.266577 .891077 ~-.259011 
12.0949 -5.39683 2.97504 ~.899688 | 1.24252 -.327080 .959181 -.299607 
14,8023 -6.37653 3.28997 -.913744 | 1.31180 -,401326 1.03877 -.343800 
18.1244 -7,51996 3.64083 -.925727 | 1.40127 -.492641 1.13198 -.391156 


22.1875 -8.85155 4.03126 —.935932 | 1.51709 -.605233 1.24145 -.441013 
27.1403 =-10.3988 4.46517 -~.944623 | 1.66743 -.744459 1.37042 -.492483 
33.1573 =-12,1927 4,94678 -~,.952028 | 1.86310 -.917165 1.52288 -.544498 
40.4428 -14,2679 5.48068 -.958345 | 2.11857 -1.13214 1.70381 -.595891 
49.2342 -16,6628 6.07169 -.963742 | 2.45318 -1.40071 1.91942 -.645504 


59.8081 -19.4208 6.72512 -.968361 | 2.89293 -1.73757 2.17751 -.692299 
72,4834 -22.5896 7.44658 -.972322 | 3.47293 -2.16185 2.48793 -.735459 
87.6276 -26.2220 8.24204 -.975727 | 4.24075 -2.69858 2.86318 -.774443 
105.66 -30.376 9.1178 -.97866 | 5.2612 -3.3807 3.3192 -.80899 
127.07 -35.117 10,081 -.98119 [6.6229 -4.2517 3.8765 -.83912 


152.40 -40.515 11.138 -,98338 | 8.4477 -5,.3696 4.5614 -,86502 
182.29 =-46.650 12.298 -.98529 | 10.903 -6.8116 5.4082 -.88703 
217.42 =-53.601 13,567 -,.98694 14.224 -8.6818 6.4616 -.90557 
258.61 - -61.465 14.954 -,98838 | 18.735 -11.121 7.7804 -~.92109 
306.78 -70.347 16,471 -.98964 | 24.892 -14.319 9.4423 ~.93401 


362,91 -80.350 18.124 -.99074 | 33.339 -18.539 11.550 -.94e/s 
428.11 -91.586 19.922 -.99171 | 44.986 -=-24.139 14.243 --.95361 
503.57 <=-104,17 21.874 -.99256 |61.132 -31.616 17.706 ~--,.96097 
691.03 -118,31 24.003 ~-.99332 | 83.638 -41.664 22.193 ~-.96706 
691.78 -134.10 26.311 -.99398 | 115.19 -55.252 28.046 -.97211 


807.71 -151.73 28.813 ~-.99457 | 159.66 -73.750 35.740 -.97630 
940.38 -171.30 -.99509 | 222.74 -99.100 45.930 -.97979 
1091.4 -192.92 -.99555 | 312.73 -134.08 59.526 -.98270 
1265.4 -217,17 -.99596 | 441.92 -182.68 77.810 -.98514 
1458.6 -243.23 -.99632 | 628.58 -250.68 102.59 -.98718 


1677.8 =-271.99 -.99665 | 899.99 -346.53 136.44 -.98890 
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When truncation or type I censoring occurs on the left, entries in this table correspond- 
ing to% = € are applicable. For right truncated or type I right censored samples, read 
entries corresponding to 7 = -€ , but delete negative signs from y,, and (. For both 
type II left censored and type II right censored samples, read entries corresponding to 
Percent Restriction = 100h, but for right censoring delete negative signs from Bio and P- 
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was the work of Bliss.) For any given truncated or type I censored sample, 
after calculating £ = (2 — f)/é¢, enter the appropriate columns of table 3 with 
4 = £ if the restriction is on the left or with 4 = —é if the restriction is on the 
right, and interpolate to obtain the required values of the u,; . For type II 
censored samples, enter table 3 through the Percent Restricted column with 
Percent Restricted = 100h and interpolate to obtain the required values of the 
ui; - In all cases when restriction is on the left, the negative signs affixed to 
entries for u;. and p are retained, but are to be deleted for right restricted samples. 
With the u,; thus evaluated, the asymptotic variances and covariances may be 
approximated using (5) with o” replaced by its estimate ¢’. 

To illustrate the ease with which the tables presented here may be employed 
in practical situations, we select two examples that were previously considered 
in [1]. 

Left truncated sample. Data for this sample, which was given in [1] as example 
1, are summarized as follows: = 0.124624, s°> = 2.1106 X 10°°, x = 0.1215 
and n = 100. It follows that # = s’/(¢ — xo)’ = 0.21627 and linear interpolation 
in table 1 immediately yields 6 = 0.03012 which is in exact agreement with 
the value previously obtained in [1]. Using (1), we then compute a = 0.1245, 
é = 2.405 X 107°, and ¢ = 0.00155. For the variances and covariance, we 
enter table 3 with £ = (%) — a)/¢ = —1.94, and interpolate linearly to obtain 
Mi. = 1.2376, w2 = —0.26861, wo. = 0.76841, and p;,, = —0.2750. Note that 
My. and p;,, are negative since this sample is left restricted. When these values 
are substituted into (5), and a” is replaced by its estimate ¢? = 2.405 x 107°, 
the variances and covariance follow immediately as V(a@) = 2.98 x 10°°, 
V(é) = 1.85 X 107°, and Cov (a, ¢) = —0.65 X 107°, in agreement with the 
results obtained in [1]. Here, however, the necessary computational effort has 
been substantially reduced from that originally required. 

Right Censored Type II Sample. Data for this sample which was given in [1] 
as example 6 and which was originally given by Gupta [4], are summarized as: 
€ = 1,304.832, s* = 12,128,250, z, = 1,450.000, N = 300, and x = 119. It 
follows that 7 = s°/( — z,)’ = 0.575515 and h = 181/300 = 0.6033. Two-way 
linear interpolation in table 2 immediately yields } = 1.36. Using (3), we then 
compute 2 = 1,502, é” = 40,789, and ¢ = 202. For the variances and covariances 
we enter table 3 with Percent Restriction = 100h = 60.33 and interpolate 
linearly to obtain ,, = 2.022, wi. = 1.051, wa. = 1.635 and p;,, = 0.576. Note 
that here u,. and p;,; are positive since in this example the restriction is on the 
right side. Using the values determined above with é? = 40,789 substituted for 
o’, the variances and covariance follow from (5) as V(a) = 274.9, V(¢) = 222.3, 
and Cov (a, ¢) = 142.9. Except for errors in the signs of u,, , and p;,; which 
occur in [1], the results obtained here agree with the more laboriously computed 
results of the former paper. 

The assistance of Mr. Robert Everett and Mr. David Lifsey, who performed 
most of the computations involved in preparing these tables, is gratefully 
acknowledged. 
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The Folded Normal Distribution* 


F. C. Leone’, L. S. Netson’, R. B. Norrmivcnam® 


Measurements are frequently recorded without their algebraic sign. As a con- 
sequence, the underlying distribution of measurements is replaced by a distribution 
of absolute measurements. When the underlying distribution is normal, the resulting 
distribution is called the “folded normal distribution”. The authors describe methods 
for estimating the mean and standard deviation of the normal distribution based 
on estimates of the mean and standard deviation determined from the folded normal. 
Tables are provided to assist in the estimation procedure and an example included. 


1. INTRODUCTION 


In the past few years inspection by variables has found increasing use through 
efforts to increase inspection efficiency. The usual assumption of normality is 
valid in many applications, though not in all. One apparently fairly common 
type of non-normality has recently come to the attention of the authors. When- 
ever a difference or deviation is measured and the algebraic sign is unknown, 
disregarded, lost, or otherwise absent, the resulting distribution of these absolute 
measurements can range in shape from Daniel’s “half normal” [1] to the normal 
distribution as the limit. 

Typical of the examples which arise in industrial practice are: 


(1) measurement of flatness in which the object is placed convex side up (ir- 
respective of which side this is) and a feeler gauge is used to find the difference 
between the surface plate and the object. 

(2) measurement of the camber (straightness) of mininature radio tube leads 
(see Sec. 4). 

(3) determination of the centrality of the sprocket holes in motion picture film 
by finding the difference between the distance (Left edge to Left hole) and 
(Right edge to Right hole). The edge and hole designations are arbitrary 
because the (unused) film strip sample is not oriented. The distance be- 
tween the holes is constant since a single die punches the holes 
simultaneously. 


The characteristic which these examples share is that a measurement is 
recorded without its algebraic sign. The effect of dropping the sign is to add the 
otherwise negative values to the positive values. Geometrically this amounts 
to folding the negative side of the distribution onto the positive side. When 
the underlying distribution (of the algebraic values) is normal, the distribution of 
the absolute measurements is here described as the folded normal distribution. 

1 Case Institute of Technology, Cleveland, Ohio 


? General Electric Lamp Division, Cleveland, Ohio 
* North Electric Company, Galion, Ohio 
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Daniel’s half normal distribution is folded at the mean. In the general case 
treated here, the “fold” can occur at any distance from the original mean. We 
propose to consider first the folded normal probability density function (pdf), 
especially as it relates to the original normal pdf from which it came. We then 
consider some maximum likelihood estimates, followed by another estimating 
procedure which is simpler to handle. A table is presented for the estimation 
of uw and o (the mean and standard deviation of the original normal) given an 
estimate of uw, and a, (of the folded normal). A second table gives the areas 
of the folded normal for values of t, from 0 to 7 for eighteen different values of 
By/Oy « 

Finally, an example of real camber data is presented with an estimation of the 
theoretical distributions. 


2. THE Fotpep NorMat DistRIBUTION 
To derive a folded normal probability density function, first consider a normal 
probability density function 


—(y—B)*/20% 


1 
> , —-x <y<o, (1) 


where u and o” are the mean and the variance, respectively. Consider now the 

situation where x = | y |, with z > 0. The probability function has been “‘folded”’ 

at y = 0. Generally, in situations such as those previously mentioned, the values 

of » and o are not known. Thus the distance between the mean yz and the fold 
is unknown. 

The probability density function of the resulting folded normal distribution is 

1 —(z~—p)*/2e% —(z+p)*/2¢e% 
f(x) ae; le +e ], «#20. (2) 


By direct integration we obtain as the mean value of x 
By = V2/roe"?" + wll — 2F(—pn/o)], (3) 


where 


F(a) = Sa i et? dt. 
Similarily o; , the variance of the folded normal, is given by 
of = to” — {(2/n)hoe" + wll — 2F(—n/o)]}”. (4) 
Maximum likelihood estimates of u and o” are obtained in the usual manner. 
The estimating equations which result are 


x; 
ree 1 + ether 


= (na + Sa, /2 , (5) 
(m+ ¥ a) 


t=1 


— | = (x; + A)* — én |/aa. 


ras 1 fg Mees" 
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Equating these gives 
é° = (1/fn) Do ai — p. 
t=1 


Maximum likelihood estimation is not practical here because of the complexity 
of the calculation. 

Another approach for estimating u and o” is to obtain estimates of u, and o7 . 
It is reasonable to use as estimates of u, and a; the unbiased statistics 


= = (1/n) = xz; and s = [I/(m —1)] > (x; — 2)’, 


t=1 
respectively. With these estimates and equations (3) and (4) one can then 


solve for estimates of » and o”. 
From equations (3) and (4) we readily derive 


(u,/o;)° < 


I+ w/oy 9 


TABLE 1 
Values of of and u Corresponding to Various Values of us/o; 
In ¢ Units In o Units 


4 


64 
-66 
.68 


x 
° 


WNW NNN pee ee 
SaSRS FERS 


Seexe 2sseeeR 
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o = (uy + 0) /(1 + &), 


me 82 = et 
G(@) = V Ble OO = w/e. 


Table I gives values of a, and yu, both in o units, corresponding to values of 
u,/o, 80 spaced that linear interpolation beyond u,/c, = 1.50 yields values 


TaBLe IL 


Areas of the Folded Normal Distribution 
(Classified According to wu, /o;) 


ty 
F(y) = [ foddx ty 20 


eee ee 


0.0 
0.1 
0.2 
0-4 
0 

0.5 
0.6 
0.7 
0.8 
0.9 
1.0 
1.1 
1.2 
1.3 
Le 
1.5 
1.6 
1 

1:8 
1.9 
2.0 
2-1 
2.2 
2.3 
2.4 
2.5 
2.6 
2-7 
2.8 
2.9 
3.0 
3-2 
3.2 
"2 
3:8 


ee 
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correct to approximately one in the fourth decimal place. One can thus make 


use of the estimates of u, and oc, as given by £ and s to estimate o and » from 
Table I. 


3. AREAS OF THE FoLDED NorMAL DISTRIBUTION 


In many problems there may be no interest in relating the folded normal to a 
parent normal probability density function. One may wish to determine prob- 
abilities associated with the folded distribution. To solve such problems, Table 
II gives the areas under the folded normal from 0 to ¢, for various values of u,/o, . 


TABLE IL (continued) 


Areas of the Folded Normal Distribution 
(Classified According to us /o;) 


tf 
F(t) = { toddx ty 20 


Ferre CV OS NFWhPrOo 


0 
0 
0 
0 
oO. 
oO. 
0 
0 
0 
0 
1 


RRR H 
cee Leer Tere e Cheeses S86 o 6 Sees 8 
UFWNH CODIS UFWNrH CODNF Ww 


WWwWWwWw WNNDNM NNNNN NER 
. . 


. 
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It is interesting to observe that F(¢t,) for u,/o, equal to, say k, is not necessarily 
greater than F(t,) for u,/o,; equal to k’ > k. This is caused by u, and o, changing 
at different rates as the position of the fold changes. 


4. AN ILLUSTRATION 


In the manufacture of miniature radio tubes, leads are fixed in place by auto- 
matic machinery. It is important that the leads be straight, otherwise, they may 
fail to drop into position and can jam the feeding mechanism. Straightness is 
measured on an optical comparator by clamping one end of the lead in a rotating 
chuck. The camber is reported as the differences between the maximum and 


TABLE IL (continued) 


Areas of the Folded Normal Distribution 
(Classified According to us /o;) 


tf 
F (t,) = ff fxrdx ty 20 


+9663 
+9723 
9774 
-9816 
-9852 


-9882 
+9906 
+9926 
9942 
+9955 


+9965 
. “wae 
“Zine - 9980 
-9983.  .99 
-9987 998 


-9990 .9991 
+9993 

‘ -9995 

-99%  .99% . 

-9995 .9997 .9997 


-9997 .9998 .9998 
-9997 .9998 = .9999 
-9999 «9999 . 
-9999 .9999 .9999 .9999 
-9999 1.0000 1.0000 1.0000 


1.0000 


COONS VE WNRK COO OND NEFWNkHY OOOO NEF WNr OO ONO 


sees 


46 ¢ 6.8 eevee cee ee 


3 
3 
a 
3 
4 
y 
4 
4 
4 
4. 
4 
4 
4 
4 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
6 
6 
6 
6 
6 
6. 
6 
6 
6 
6 
7 
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minimum positions assumed by the unclamped end upon rotation minus the 
diameter of the lead. Thus a lead that is perfectly straight will yield a camber 
of zero. It follows that camber is an inherently positive characteristic. Table III 
gives data taken from an experimental run. 

From Table III we can calculate = 14.0140, s = 7.7868, and #/s = 1.7997. 
Using 1.80 as an estimate of u,/o,; , Table I yields estimates of o,/o and u/o 
equal to 0.9242 and 1.6192, respectively. This gives ¢ = 8.4254 and u = 13.6424. 
Substitution of these values in equations (1) and (2) gives the estimated theoreti- 
cal normal and folded normal distributions, respectively. Figure 1 shows the 
data of Table III together with these distributions. 


TABLE IL (continued) 


Areas of the: Folded Normal Distribution 
(Classified According to uy /o;) 


tf 
F (14) = [todd ty 20 


- 8627 
- 8832 
-9015 
-9176 
+9316 


+9437 
954. 
9628 
“9702 
+9763 


-9813 
-9854 
- 9887 
9913 
+9934 


+9950 
-9963 
-9972 
-9980 
9985 


+9989 
-9992 
-9995 
-9996 
9999. -9997 


9999 -9998 
1.0000 9999 
-9999 

-9999 

1.0000 


.- eee . ee . eee 


1.0000 


3 
3 
3 
3 
4 
4 
4 
4 
4 
4 
4. 
4 
4 
4 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
6. 
6 
6 
6 
6 
6 
6 
6 
6 
6 
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TABLE 3 


Camber of 497 Lead Wires 


Observed : Observed 
Frequency Camber* Frequency 


12 23 18 
26 25 14 
36 27 12 
50 29 4 
45 31 7 
49 33 2 


51 35 
44 37 
44 39 
40 41 
32 43 


Class mid-points are given. 


Ny (wy * 14.0140, o; © 7.7868) 
N(p# 13,6424, «© 8.4254) 


FREQUENCY 


CAMBER, MILS 


Fiacure 1 - Camber of 497 lead wires. 
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The Folded N cial Distribution: Two Methods 


of Estimating Parameters from Moments 


Recina C. Eranpt* 
Case Institute of Technology and Agricultural College, Poznan 


The general formula for the rth moment of the folded normal distribution is ob- 
tained, and formulae for the first four non-central and central moments are calculated 
explicitly. 

To illustrate the mode of convergence of the folded normal! to the normal distribu- 
tion, as n/o = @ increases, the shape factors By and B72 were calculated and the 
relationship between them represented graphically. 

Two methods, one using first and second moments (Method I) and the other 
using second and fourth moments (Method II) of estimating the parameters u and o 
of the parent normal distribution are presented and their standard errors calculated. 
The accuracy of both methods, for various values of 8, are discussed. 


HicHER MoMENTS OF THE FoLtpED NoRMAL 


First it will be convenient to introduce the following notation. We define the 
‘incomplete normal moment’ 


Ia) = val ye dy. r=1,2,--: (1) 


In particular, 


I,(a) = wel e” dy = 1 — F(a), 


where 


F(a) = TE i . et dy 


is the cumulative distribution function of the unit normal N(0, 1). 
It is easy to show that for r > 0 


Ia) = eae" + & — DI, -2(0). 


As a result of this we obtain 
a) If r is even 
I,(a) = Tz fav + ¢ — la’? + (r — Ir — 38) + --- 

+ (r — 1) — 38) «++ 5-Bale™*” + (r — 1) — 3) --- 5-3al,(a); 
b) If r is odd 
I,(a) = rm [fo + (r — 1a’? + ¢-— Ir — 3a + -:-- 

+ (r — 1)(r — 3) «++ 4-2]e*" 
* Docent of Agriculture College, Poznaf, Poland. 
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We call the distribution of x = | z |, where z is distributed as N(u, a), a folded 
normal distribution. The normal probability function is “folded” at z = 0. 
The density function of the folded normal distribution is 


1 —(z—4)*/20% —(z2+4)*/20? 
z) = ——— [e +e : xz>0 
f(z) ae [ ] 

The moments of the folded normal distribution can be expressed in terms of 
the J, function. 

The rth moment of the folded normal, u/,,) , is 


Bier) = [ xf(x) dz = L-| [ wig ee Tee. dx + | ge” zt */2e" ax | 
° oV 2x Lo 0 


a a : — ° idl 
= alfa tment ant | oe — we iy] 


1 ; r=j —y* 
- = | > ("vers e'”? dy 


1 " ’ (")(- ri i r-i—v2/2 
a 2 ee * i) 1)"™"(yo)'n'e* dy 


& (ere a(-2) + nr) 


> (Jo-t—0 + (-1)"'1,(0)], 


@= 
This is a general formula for the rth moment of the folded normal distribution. 
To calculate the first four moments more explicitly we notice that 
Io(— 0) — Io(@) = —[1 — 21,(—8)] = —[1 — 2F(6)] 
I,(—6) — 1,(6) = 0 
2 - 2 
iy —-6 i 6 = — 4 
(-6) + 1,(8) /on 
° = i e 
1(- - 10) = -| 6e~** 1 - 21-») | 
(—6) — 1,(8) -_— + ( (—)) 
I.(—6) + I.(6) = 1 
I;(—6) — I,(6) = 0 
2 : 
I,(—0) + I;(6) = —= (2 + 6)e* 
(—6) + I5(6) Ve Je 


I,(—@) +- T,(0) = 3, 
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Using (9) we obtain the first four moments 


2 vy 
Mya) = wy = \/ on ae en u{l — 2I,(—6)] 


= Taz oe **” — ull — 2F(6)] 


una = op =e +o’, 
Mya) = (uw? + 20°)us — wo°[1 — 2F(8)],’ 
Mya) = ne + 6y"o" + 30%. 
Therefore, the central moments are 
Bray = 9, 


2 2 2 
Mrz) = b +o — PF 4 


3 
3 2 og 40? 
Lr3) = re bE pe | 
V 2m 


* —_ 
Msay = (ui + Gyo” + 30°) + aa ou, + Wu? — 80*)uy — Bui 
sv 


We now express the moments in o units (cf. (8)). Putting 


we obtain, e.g. 


2 = 3 
Gay = Oy = wae i” — o[1 — 2F(6)], 


O02) =1+ & — G, 
1 ‘ 
673) = ol ot — 66,-— —=e™ i 
£(3) f f V/ 
Osa) = (8 + 6H + 6) + a e*”.@, + 2(6" — 3)6; — 36f. 
T 


SpecIAL Case oF ‘HALF’ NoRMAL DIstTRIBUTION 


If the distribution of z is N(0, c), then the distribution of x = | z | is ‘folded’ 
in half and we call it ‘half’ normal [1]. Its density function is 


f(x) = LB pone, (14) 
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In this case 6 = 0 and the rth moment of the ‘half’ normal is 
wi) = 20,0) =r =1,2,-:- 
Using this formula (or formulae (10) and (11) with 6 = 0) we obtain 


May = wh = ef? = 0.7979c. 


2m7—2 
eon 


= 0.36340’, 


ing < = 0.21800", 
T wv 


.3n — 4r — 12 
2 


T 


= 0.51090". 


Miia) = 0 
In this case p,/o, = 1.3237. 


Tue CONVERGENCE OF THE FOLDED NoRMAL TO A NORMAL DISTRIBUTION 


The whole family of folded normal distributions, N,(u,; , o;), is included 
between two extreme cases, i.e., between the ‘half’ normal, for which u,/o, = 
1.3237, and the normal, for which u,/c; is infinite, (Approximate normality is 
attained for u,/o, > 3). 

To show the mode of convergence of the folded normal to the normal distri- 
bution as 6 increases, the shape factors 8,, and 8,2. were calculated, where 


V Br 3/2 


My 2) 


Myia) 
Bp = 2 = 
Ms 2) 


for some values of 6. The results are given in Table 1 and in Figure 1. 
From Table 1 we can see that the ratio u,/o, increases with 6. The skewness, 


TABLE 1 
Shape Factors By and Br of the Folded Normal for Different Values of 0 = u/c 


n/c = 0 Ot O42) V 652) ur/ot V Bn Bn 


0.7979 0.3634 0.6028 1.3237 0.9953 0.9906 
0.8956 0.4479 0.6693 1.3381 0.9499 0.9023 
0.5576 0.7467 1.3933 0.8196 0.6717 
0.6390 0.7994 1.4593 0.7013 - 0.4918 
0.8208 0.9060 1.7203 0.4030 0.1624 
0.8490 0.9214 1.7870 0.3519 0.1238 
0.8962 0.9467 1.9316 0.2612 0.0682 
0.9317 0.9652 2.0897 0.1863 0.0347 
0.9800 0.9899 2.5295 0.0672 0.0045 
0.9952 0.9976 3.0080 0.0201 0.0040 


ONNRF ER ROO 
SCUNSCMAUNDMUSO 
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Figure 1— §;, 6: diagram for the folded normal. 


/B;,, of the folded normal decreases with increase of 6; however, the kurtosis 
attains a minimum value, 6,2 ~ 2.68 (approximately) for @ = 1.8. The relation- 
ship between 6,, and 872 is given by the curve in Figure 1. This is described in 
more detail below. 


Some REMARKS ABOUT THE DISTRIBUTION OF THE SAMPLE MEAN 


The mean, py , is usually estimated by the sample mean @. The exact distri- 
bution of # is rather complicated. From the central limit theorem, Z is approxi- 
mately distributed as N(u,; , c,/-*n) for sufficient large n. The convergence to 
normality depends, of course, on n, but also on 6 (or yu;/e,). To illustrate this, 
8,,(£) and B,2(#) have been calculated. It is easy to prove that for any random 
variable x we have 


g(a) = 2 (18) 


sx(@) = 3 + 2@— 3. 


The values of 8;,(Z) and 8,.(Z) for different values of n are given in Table 2. 
The same values can be obtained graphically. In Figure 1, joining the point 
(8y(z) , Bya(x)) to the point (0 , 3) by a straight line and cutting off a section of 
length equal to 1/n of the length of section {(6,:(z) , Ba(x)) , (0 , 3)} starting 
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from the point (0 , 3), we obtain the points (8,;(@) , By2(Z)) for different values 


of n. 


Figure 1 shows that the convergence of the distribution of ~ to normality 
increases with 6 (and so also with u,/c,;). 


u/o=0 


0.0 


0.5 


Shape 
factors 


V Bu(Z) 
Bu(Z) 
Bro Z) 


V Bn(Z) 


Bnu(Z) 
Br(Z) 


V Bn(2) 
Bu(Z) 
Bro(Z) 


V Bn(Z) 
Bu(Z) 
Br(Z) 


V Bn(2) 
Bu(Z) 
Br 2) 


V Ba(2) 
Bu(Z) 
Bel) 
V Bu(Z) 


Bn(Z) 
Br(Z) 


V Bn(2) 
Bu(Z) 
Br(Z) 


V Bn(2) 
Bu(Z) 
Br(Z) 


V Bn(2) 


Bu(Z) 
Bt2(Z) 


0.4976 
0.2490 
3.2172 


0.4749 
0.2256 
3.1765 


0.4098 
0.1679 
3.0836 


0.3507 
0.1230 
3.0201 


0.2015 
0.0406 
2.9287 


0.1759 
0.0310 
2.9232 


0.1306 
0.0171 
2.9208 


0.0931 
0.0087 
2.9287 


0.0336 
0.0011 
2.9605 


0.0100 
0.0010 
2.9832 


0.3519 
0.1245 
3.1086 


0.3358 
0.1128 
3.0882 


0.2898 
0.0840 
3.0418 


0.2480 
0.0615 
3.0101 


0.1425 
0.0203 
2.9645 


0.1244 
0.0155 
2.9616 


0.0924 
0.0085 
2.9604 


0.0659 
0.0043 
2.9643 


TABLE 2 
Shape factors B(x) and B(x) for different values of 0 and different sample sizes n. 


10 


0.3147 
0.0991 
3.0869 


0.3004 
0.0902 
3.0706 


0.2592 
0.0672 
3.0334 


0.2218 
0.0492 
3.0081 


0.1274 
0.0162 
2.9715 


0.1113 
0.0124 
2.9693 


0.0826 
0.0068 
2.9683 


0.0589 
0.0035 
2.9715 


0.0212 
0.0005 
2.9842 


0.0064 
0.0004 
2.9933 


Sample size (n) 


15 


0.2570 
0.0664 
3.0579 


0.2453 
0.0602 
3.0471 


0.2116 
0.0448 
3.0223 


0.1811 
0.0328 
3.0054 


0.1040 
0.0108 
2.9810 


0.0909 
0.0083 
2.9795 


0.0674 
0.0045 
2.9789 


0.0481 
0.0023 


20 


0.2226 
0.0498 
3.0434 


0.2124 
0.0451 
3.0353 


0.1833 
0.0336 
3.0167 


0.1568 
0.0246 
3.0040 


0.0901 
0.0081 
2.9857 


0.0787 
0.0062 
2.9846 


0.0584 
0.0034 
2.9842 


0.0416 
0.0017 


30 


0.1817 
0.0332 
3.0290 


0.1734 
0.0301 
3.0235 


0.1496 
0.0224 
3.0111 


0.1280 
0.0164 
3.0027 


0.0736 
0.0054 
2.9905 


0.0642 
0.0041 
2.9898 


0.0477 
0.0023 
2.9894 


0.0340 
0.0012 


60 


0.1285 
0.0166 
3.0145 


0.1226 
0.0150 
3.0118 


0.1058 
0.0112 
3.0056 


0.0905 
0.0082 
3.0013 


0.0520 
0.0027 
2.9952 


0.0454 
0.0021 
2.9949 


0.0337 
0.0011 
2.9947 


0.0067 
0.00004 | 
2.9984 | 


0.0020 


0.0002 0.00014 0.00007 0.00004 
2.9966 2.9978 2.9989 2.9993 
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ESTIMATION OF THE PARAMETERS yu AND o 

Method I. Use of the First and Second Moments. 
This method is presented in paper [2]. Using the formulae 


2 -402 
a 


Vu — nfl — 21,(—9)], 


, wa = 
Bra) eT 


, 2 2 
By(2) a t+¢, 


on 2 er 
V 2 


mM 
1p = Oa) = 1+ 6° 


— [1 — 2I(—94)], 


we obtain the relationship 


rae — 61 — 21,(— a} 


(19) 


aoe te 
My (2) 


Values of @ for different values of u,/o, are given in paper [2]. 
The moments yp; and yu/,.) are, of course, unknown. We estimate them from 
the formulae 


Using these estimates (i.e., replacing A by A = m‘?/m$) we obtain the estimate 6. 
The estimates of u and o are respectively 


eee 
a= be 


=1 3-2. 


Method II. Use of Second and Fourth Moments. 


We can obtain a simpler formula than (19) to estimate @ by using the second 
and fourth moments. We have 


Mya) OF <4) 3+ 60 + o 
Se = = ———_ = B. 21 
My (2) 6.2) (1 ss 6°)” ( ) 


Therefore, an estimate 6 of 6 may be obtained by solving the equation 


(1+ &)°B —- (3+ 67 + &) =0, (22) 
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where 
B= mi/m; 
Equation (22) is quadratic in 6° and can be solved explicitly. 

Using the second and fourth moments of x we are effectively dealing with the 
first and second moments of x’ (z’ is distributed as non-central x” with one 
degree of freedom and non-centrality equal to 6”.) 

We naturally would like to know which estimate of 6 is the more efficient. 


To answer this the standard error of each estimate of @ has been evaluated 
approximately. 


ACCURACY OF THE ESTIMATES OF 6 
The variance of the ratio of unbiased estimates & and 7 of two quantities 
a and 7 is approximately 
{ 
) 
“a (4) ‘ai () [ varied var (a) , var (7) vari) 2 cov (a, |. (23) 
4 ¥ a ay 


Since we use in our estimations, moments of rth order and their squares, 
we first need to obtain the expected values E(m/) and E(m’) and also var (m’), 
var (m!”) and cov (mi, , m’”? , where m! = ()."_, 21)/n. 

We know that 

E(m!) = w, (24) 
and 


- 1 
var (m;) = = [u3, — mr]. 


We now calculate E(m/”) and var (m/’’). 


_—— zi) = Lay a’ +2 > aie i 


i<j 


Ly wat) +2 Deepeey) = Ay[ ne, + 22D | 


1 = 
E(m!?) = = (us, + (nv — Iu]. (28) 
We similarly calculate 


n 4 

E(m!**) — 4.25 2) = + BS at “f- 4 2d aa + 6 de ax, 2r 
t=1 

> wir t+2 vaatet} 

I< 


i<k i<i<k<l 
ivti,intk 


- 1 nat + Ania — Yasue + 62D yes 


se on 92 i ies a 4 
4 19 mn ue 2) ul ul? + 24 n(n 1 2)(n 3) 
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Hence 
var - = E(m!*) — [E(m?))? 
= = {ule + 40 — 1)ug.ml + (Qn — 3)use + 4 — 1)(n — 8)uteus” 
— 2(n — 1)(2n — 3)yl*}. 
We now calculate cov (mi, , m,). 
cov (m!?m$,) = E{[m!? — E(m?’)][mi, — ubr)} 


= All (Ea) - mr] Ear - me] 


ek BS af +2 Datel +2 Datel 


n t=1 ti 


+2 2» x; Pajah — mol Dat + 2D z'ei} 


inj, “ob 


=4 | ma + 2M) ui? + nln — Iuben! 


49% — De — 2) ¢,.02 Ped - MoD yy. 


2 Marly — Ne-Npo, 2 2rMr 


Hence 


s 1 3 s 
cov (m!, m},) = a {ule + 2(n — l)us-uh — uh — 2(n — 1)us,u’ }. = (27) 


Variance of 6. (Method I). 
From (19) we have 


(1+ ®#)A — ye on i - “a [ie ee a} = = 0, (28) 


var (6) = var (4)| (44) c (29) 
Differentiating (28) with respect to A we have 


a+) +[204 - 2? ei - 41 - A f et an} 
{+2 feral] i - 


(1+ #) + A+2 2 4)2< -. m1 - arc@iktn - 2n(6) | af =0 = 
Putting 6 = 6, A = A we have 


0-6; dé 
, a ote _ —= = 
42) E Ores 26,{1 2n(o) || 2 | 0, 


6=8 
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TABLE 3 
Variance of the estimators of 0 


Method I 


(4)... var (6) var (@ 


0.2  99920.7960 11451.518 , 3505.70 , 11368.4 286287 .95 


n n n* n 

0.4 9319.5175 261.796 4 93.34 4 238.3 1636.22 
n n n n 

ian > seis 88.710 , 35.88 , 72.4 354.84 
n n n n 

39.670 18.15 ik 27.3 110.19 


0.6 365.2502 aici sae 
n n 


3 
eo 


0.8 


1.0 


1.2 


1.5 


2.0 


2.5 


3.0 


131.3463 


74.4025 


55.7545 


50.2857 


70.4163 


136.0174 


294.6441 


13.496 


a 
n 


7.151 
n 
4.946 
n 
3.816 
n 
3.682 


n 
4.385 
n 
5.581 


ceed, 
n? 
3.64 


n? 


1.98 


n° 


0.43 


+ + 
oS 
oO 


* 


or 
_ 


A 


wo 


O3/O38 
> “a 


Sos 


n 


and therefore, 


[24 | se etcetera ciate, (30) 
dA Sgn» 20,[ 60, + Gy ¢2)(1 a 2F(6))] 


Following formula (23) we have 


var (A) = var (mi) ww a (ui-)’ | warden (m;") 4 Var ( var (m:) 


72 , 
26am). 
Me Mi hm Mi He 
Variance of 6. (Method II) 

We now return to equation (22) 


(1+ &)°B —-(3+ 67+ 6) =0. 


var (6) = var | (%) |... 


Differentiating (22) with respect to B we have 
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TABLE 3—Continued 


Method II 


($)... var (6) xa (8) 


308.9157 BEI _ 21858.23 , 432066.1 183812.90 


n n® n 
‘aii 203.117 _ 551.08 , 3545.0 1269.48 
n n n n 
76.125 187.24 , 1279.0 304.50 
.  «£ n° n 
37.454 80.66 , 593.4 104.04 
s + n° n 
15.017 22.92 , 199.0 23.46 


1.1597 . a 7 a 


1.0000 a .. = 2.00 
n n n n 
_ 6.744 _ 5.09 53.6 4.68 
n n n 
5.602 2.27 , 29.0 
n n° n® 
5.490 _ 3.32, 14.7 
n n n° 
6.224 434 , 9.6 
a. a r 


3.8147 


2.1191 


1.6165 
3.8147 


9.2941 


-b 
21.4335 [2 Sy 


n 


2 
3 


9. 
n 
i. 
n 


(1 + &) + [211 + &)-26B — 126 — 45°} 2 i 


and finally 
ol oe (1 + §°)” eae 
~ 46((B — 3) + (B- 18] 
Putting 6 = @and B = 


| <3 ican ai i (32) 
dB 5=0 46[( O54) a 365 (2)) + 0°( Orca) = 67(2))] 
According to formula (23) we have 
, s\2 ’ 72 , 72 
var By = vax (My) (i) [ var 4. var (mnt) _ 2.00¥ (mi, ml. (ap 
Me Me Ma Me MsHe 


RELATIVE EFFICIENCY OF THE Two METHODS 


To compare the efficiency of the two methods approximate values of var (6) 
and var (6) were calculated for different values of @. The results are given in’ 
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Method I Method I 


Method II ----- Method II 


1.0 1.5 2.0 2.5 3.0 
> 
Fig. 2 et 3 
Comparison of variances Comparison of [coefficients of 
of estimators of @ variations]* of estimators of @ 


Fiaure 2—Comparison of variances of Ficure 3—Comparison of (coefficients 
estimators of 6. variations]? of estimators of @. 


Table 3. Neglecting the second and third terms, which is justified for sufficiently 


large n, curves of var (6) and var () are drawn in Figure 2. 

From Table 3 and Figure 2 we can see that the use of the first and second 
moments give relatively bigger variance for small @ (approximately @ < 0.62. 
For @ > 0.62 the use of second and fourth moments is the more efficient. 

From Table 1 of this paper (or Table 1 in paper [2] we can see that if @ < 0.62, 
then approximately u,/o, < 1.35. So we can say roughly, that if our observed 
ratio Z/s is less than 1.35, it is probably more accurate to estimate @ using the 
second and fourth moments. 

In Figure 3 there are given the relative variances var (6)/6? and var (6)/6 
(neglecting terms of order less than n~*). These are the squares of the coefficients 
of variation of the estimators of 6. These values are useful in indicating the 
percentage error to be expected in estimates of 6 by either method. 
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1. SUMMARY AND INTRODUCTION 


In recent years the use of stationary random inputs as a forcing function to 
experimentally determine the transfer function of linear systems has become 
widespread. The procedure involves the measurement of spectra and cross- 
spectra between the input and output and the formation of the proper ratio. 

There are two basic reasons for using random testing functions. For many situa- 
tions, particularly mechanical ones, these inputs are easier to generate than say 
steps or impulses and frequently they can be made to more closely approximate 
the in-service input. (The latter attribute is an advantage since the linearity of the 
system may not be complete but may be sufficiently so over the operating range 
of interest). A simple measure of linearity is immediately available i.e., the 
coherency. 

Although many experimenters try to generate a Gaussian stationary process 
for the forcing function this is not necessary from an expected value veiwpoint. 
Actually the probability structure plays no role in the general logic of the de- 
tailed procedure since only second moment characteristics of the process are 
relevant to the expected value calculations. However, a Gaussian input can be 
convenient since sometimes the evaluation of the variability of the estimates 
is simplified. In studying higher order systems by driving them with a random 
forcing function this use of a Gaussian process makes the calculations of the 
expected values considerably easier. In this paper we shall extend the spectral 
techniques of transfer function estimation of linear systems to time invariant 


quadratic systems when a stationary Gaussian process is used as a driving 
function. 











2. Time INVARIANT QuADRATIC SysTEM 


If X(t) is the input to a system S and Y(t) is its output, S will be said to be a 
time invariant quadratic system if 


) Y= / Ut — 1)X(1) dr + / [ ot — t= )X(n)X(r) des dre. 


The functions / and g are to be viewed in a general since in that they may con- 
tain delta functions and derivatives thereof. It is possible to avoid considera- 
tion of such functions by giving definitions in the Fourier domain and we do that 

1 This research supported by David Taylor Model Basin under Contract Nonr 285(17). Re- 
production in whole or in part is permitted for any purpose of the United States Government. 
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below. We refer to both / and q jointly as the impulse response function of S. 
Without loss of generality we can assume g(a, b) = q(b, a). The quantities L 
and Q, if the integrals exist, are defined by 


(2) U(r) = = e**"L(w) de, 


(3) q(n ? T2) = edt e tester, , Ws) dw, dary . 


We refer to them as the transfer function of the system. L and / are, of course, 
the transfer and impulse functions of a linear system. Let X (¢) admit of a Fourier 
representation 


(4) X() = fe" ake). 


Such a representation is possible if X(¢) is a stationary random process, or is 
an integrable deterministic function. Equation (4) can be written as 


(5) YC) = f e**L(u) dele) + ff et Q Gor , ws) Blox) dB(o). 


It is possible, and easier from a standpoint of rigor to use (5) as the definition 
of a time invariant quadratic system. 

Before we consider the estimation of ZL and Q it is well to say a few words 
about how the form can be expected to arise. It can arise by assumption be- 
cause this is the nature of the system. It also can arise because the extent of 
non linearity in the system is sufficiently small so that a perturbation expansion 
may be truncated after the quadratic terms. In a way (1) or (5) represents a 
kind of truncated polynomial expansion. Weiner [1] uses this type of expansion 
(5) to represent an arbitrary random process from Brownian motion process. 
He refers to the terms in the expansion as polynomial functionals. 


3. ESTIMATION OF THE TRANSFER FUNCTION 


Suppose that X(é) is a stationary Gaussian process with zero mean and spectral 
density s,,(w) and covariance function R,,(7). (It is sufficient to assume that 
the third and fourth cumulant functions of X(é) are identically zero but for 
simplicity we make the more restrictive assumption.) Then 


Ey) = [ff at — 1, t— 1)Relme — 1») dry dr 

(6) 
= | e%, —w)s,,(w) dw = M,. 

The cross-spectrum between the input and output s,,(w) is easily computed to 

be, if s,,(w) is the input process spectrum, 

(7) 8,,(w) = 8,.()L(@). 

Therefore 


(8) LW) = 8,,(w)/8,2(). 
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The usual procedure for estimating the transfer function of a linear system will 
calculate the linear component of the transfer of a quadratic system. The third 
moment function about the mean of the input, input, and output, is 


E[X(¢ + t.)X(t + 4){ YQ — ELY(O)}] = Mae , &) 


= [fae — 2,4 — re) EXC) (XE + XE + 4) 
— R,,(0)R..(72 — 1))} 


- [| q(t on 2 t— 72) [Rez(t +ii- 7:)R(t +i,— T2) 
+ R_At + t — 72) R(t + te ti 7)] 


= 2 I gf ise erne, » W2)Srz(W1)Sr2(w@2) dw, dw . 


In the above calculation we used the property that the third and fourth cumulant 
functions are zero. Following J. W. Tukey we call the Fourier Transform of 
M(t, , tz) the cross bi-spectrum. Therefore 


_ Cross Bi-Spec. (@: , 2) 
(10) Qe » 2) oi 2s,2(w1)8.2(we) , 
where, of course, 


(11) Cross Bi-Spec (w, ) We) = me I pone ? to) dt, dt, e 


The cross bi-spectrum defined above has certain obvious symmetry properties. 
Some are C.B.S. (w, , w.) = C.B.S. (wy , w,), since M3(t, , t:) = Ms(t, , t). Also 
E{X@X(Et + 4)[Y@ + &) — EY¢ + t)]} = Ms(a — 4, , — &). Equations 
(8) and (10) provide a procedure for estimating the transfer function of a time 
invariant quadratic system when a stationary Gaussian process is used as a 
forcing function. Now 


(12) Cov [¥()¥t+ 9] = / ef? | L(w) |? See(a) do 


+ 2 [| ” ii at | Qo, ’ We) |? s,.(w1)822(w2) dw, dw. . 


The Fourier transform of (12) is the spectral density, and is 
| L(s) ass) +2 [ [Qe — ff) snl — Delt) de. 


4. A Measure oF QuapRATIC COHERNECE 


If it may be assumed that there is virtually no noise in the measuring system, 
and if g = 0 we have a simple measure of the degree of linearity of the system. 
This quantity is the coherence defined by 


| _ [eo-spec]* + [quad-spec]’ 
(13) cohy2 (w) = (spec,) (Specs) 
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The coherence will be one at any frequency at which there is a linear relationship 
and less than one otherwise. It would be convenient to have a measure (with 
the reader’s indulgence) of the “Quadracity” of the system i.e., a measure of 
“goodness-of-fit” of the quadratic model. M. A. Woodbury has observed that 
the definition of linear coherency is the ratio of the output spectrum determined 
from cross-spectral techniques assuming the existence of a transfer function of 
the output spectrum. This observation leads immediately to a measure of 
“quadratic coherency.” It is 


quad. coh @w) = 
(1 4) 8yy() 


| s.,(w) |? , 1 f | — ,d) 7 
{Leola +3] << : eb} an} 


mere |C.BS. @ — 4,» /? \ 
= lin. coh @) + mw + { | ae = Kent dy 
In the above we have used (3, 8, 10, 11). 

It can easily be shown that (14) is less than or equal to one. The proof is 
exactly the same as the proof that the linear coherency is less than or equal to 
one. Eq. (14) will be one when the system is quadratic. 


5. CoNcLUDING REMARKS 


It is possible to take a somewhat different and perhaps more general view- 
point on the estimation equations (8, 10). If the quadratic model (5) represents 
the system then these equations are a means of estimating the parameters of 
the system. However, for an arbitrary system they are also a means of “‘fitting”’ a 
quadratic model of the form (5) to this arbitrary system* where the quad- 
coherency, (14), provides a measure of the success of the fitting. Therefore, 
the procedures contained herein can be used to obtain a “general” steady state 
solution, within the quadratic class (5) of, say, a non-linear differential equation 
be experimental means. This can be done by setting up the equation on, say, 
an analog computer and driving it with Gaussian noise. What fitting criterion 
this procedure conforms to is an open question. 

The reader should notice that two problem areas related to the use of these 
estimation procedures have been ignored. They are: computation procedures, 
bias and variability. There are two general procedures for making these calcula- 
tions. The first involves the calculation of the sample third moment, smoothing 
and taking Fourier transforms. It will be observed upon a little analysis that 
in contradistinction, to the case of spectral estimation, most of the effort in 
bispectral computation is in the Fourier transform. Therefore, it may be reason- 
able in some problems not to compute the transform over the entire range. 
The second procedure is analogous to computing a spectrum by using a narrow 
band filter and squaring and averaging the result. The filter is then tuned through 
the frequency bands of interest. For bispectral estimation the series are filtered 
through three narrow-band filters and the triple products taken and smoothed. 
Here one obtains only the bispectrum at the points of interest. The reader 
should bear in mind that bispectral computations are lengthy and should only 
be attempted on the fastest equipment. As for variability, we have performed 


* This is similar to approximating an arbitrary curve by a polynomial of a given order. 
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some computation on synthetic series and the indications are that one may need 
five times (Tukey suggests ten) as much data as in the spectrum case. It is 
hoped to make a more detailed report in a later paper. 
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Book Reviews 


StatisticaL THEORY AND METHODOLOGY IN SCIENCE AND ENGINEERING by K. A. 
Brownlee. John Wiley and Sons Ltd., New York, (1960), 570 pages, $16.75. 


Any intending author of an elementary statistical text has to face, right at the outset, very 
difficult questions of approach. Shall he try to instill a sound statistical out-look, quoting with- 
out proof any theoretical results he needs? Or, perhaps go a certain way in deriving results, 
quoting only in the case of the more sophisticated problems? The mathematical demands one 
is prepared to make of the reader are crucial. Unless one is aiming at mathematical specialists, 
many mathematical techniques and devices must be forsworn. This does not necessarily mean 
(though in practice it often does mean) that a theoretical treatment is ruled out: it is surprising 
how far one can go with a determined use of only elementary mathematics. Brownlee goes 
further than most; and it is this which gives his book its characteristic flavour. Naturally this 
involves a fair amount of tedious slogging but this, I think, is unavoidable with a mathematical 
framework restricted to elementary algebra, differentiation and very simple integration and 
without recourse anywhere to the magic phrase ‘it can be proved that ...’. Thus the book is 
to a high degree self-contained and scientists and technologists will find it very rewarding. 

The methods expounded are copiously illustrated by examples; there are exercises at the 
end of each chapter; and there is a useful collection of tables (from Biometrika Tables and 
from Hald’s collection). The book is undoubtedly destined to become a widely used teaching 
text. 

The book opens with a chapter called ‘Mathematical Ideas’ which deals with probability, 
random variables, distributions, expectation, variance and so on; and follows this with a chapter 
on ‘Statistical Ideas’ which gives very lucid brief introductions to Estimation, Maximum 
Likelinood, Hypothesis Testing and Confidence Limits. These introductory sections are follow- 
ed by chapters on the Binomial, Hypergeometric, Poisson, Chi-squared and Multinomial 
Distributions; Contingency Tables, tests of randomness, control charts, non-parametric tests, 
linear regression, and correlation. Chapters on the partitioning of sums of squares and on tests 
of equality for variances and means lead up to a very detailed account of analysis of variance 
(models I, II and mixed) and a quite substantial introduction to experimental design. 

The reviewer, even when he admires, is traditionally allowed a few carping criticisms. The 
following remarks, which fall into this category, do not detract from this reviewer’s admiration 
of an excellent book. 

The definition of consistency (of an estimator) is, as in most texts, non-Fisherian; it is the 
usual convergence-in-probability definition. This is not always altogether easy to apply in any 
specific case and some practical criteria might well been have quoted here. 

The section of Chapter 3 dealing with the Normal approximation to the Binomial distri- 
bution would be improved by a sentence or two explaining the requirements on n and p for the 
approximation to be a good one. 

In the same chapter confidence limits for the Binomial parameter, which require the solution, 
for @ and 6, of the equations 


z, 


> ("“\ra - a =P, 


z=0 
zo~1 n 

a ora — 0" =P, 
z=0 


are approximated by using the normal distribution and the F-tables, and ‘simple but tedious 
algebra’. Is this not a case where text books should begin to admit the existence of Binomial 
probability tables? Their direct use in this problem will rapidly provide the exact solution. 


Emlyn Lloyd 
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Error Correctine Cones, by W. W. Peterson. MIT Press and John Wiley and Sons 
Inc. New York, (1960), 330 pages, $7.75 


This book will be found very useful to engineers as well as mathematicians. The necessary 
background in group theory, field theory and in the theory of matrices is adequately developed 
as it is needed. Chapters on coding theory proper are preceded by an introduction to that part 
of the algebraic theory which is necessary for their understanding. This procedure is very 
skillfully applied and will serve to stimulate the interest of the non-mathematical reader and 
to keep his attention from flagging. On the other hand the algebraically educated reader will 
find the many-fold applications of algebra very stimulating. In fact, the problems arising in 
coding theory focus attention on many features of matrices and Galois-fields which have not 
been considered before. The authors chapter on linear switching circuits, giving much needed 
information on the technical implementation of codes, will be appreciated by the mathematical 
reader as it was by the reviewer. 


H. B. Mann 


Cartesian Tensors: AN INTRODUCTION, by G. Temple (Methuen’s Monographs on 
Physical Subjects) Methuen & Co. Ltd., London and John Wiley & Sons Inc., New 
York, (1960), 92 pages. 


This book furnishes an elegant introduction to cartesian tensors for students of applied math- 
ematics and physics. All the topics discussed by Sir Harold Jeffery in his classical textblok on 
cartesian tensors are there, arranged in a very systematic manner. In addition, there are topics 
such as the structure of tensors and spinors. The tensors are defined, following Bourbaki, as 
multilinear functions of direction. The structure of tensors is explained in terms of eigenvectors. 
The theory of spinors is introduced by the way of isotropic tensors as well as by the way of 
Clifford algebra. 

Every concept is illustrated with examples from elasticity, hydrodynamics, quantum 
theory and relativity. In addition to its simplicity and clarity, the book is self-contained. As 


such this book would be widely welcomed. 


R. P. Kanwal 
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Case Institute of Technology 


A Bibliography of Statistical Computer 
Routines, including the following and many 
other abstracts, has been compiled and will 
be available for distribution this fall. This 
can be obtained by writing to the Computing 
Center, Case Institute of Technology, Cleve- 
land 6, Ohio, and the cost will be $3.00. 
This compilation was made by Jack Alanen, 
Gary Andrew, Fred C. Leone and A. §&. 
Qureishi. 


Least Squares Curve Fitting on Unequally 
Spaced Arguments, TB 65, MRS 005. Bur- 
roughs 205, August 1959 


Provides the k + 1 coefficients yo, 71, 
2) «++ » Yk Of & polynomial 


k 
Piz) = 2x! 
i-0 


that best fits, in the least squares sense, a 
two-dimensional set of data having unequally 
spaced intervals of the independent variable 
(equally spaced data are, of course, accept- 
able). 

The program may be used independently 
or as a subroutine. Single precision floating- 
point arithmetic is used. Input of both the 
program and the data is via punched paper 
tape. Output is via the Flexowriter or High 
Speed punch and includes the polynomial 
coefficients, the z and y value observed, the 
y-value computed, and the square of the 
normalized error. 


Limitations: 
1. Degree of polynomial < 19 
2. Number of data points < 860 


A flow chart and program listing are avail- 
able. Burroughs Corporation, Electro Data 
Division, Applied Mathematics Section, 460 
Sierra Madre Villa, Pasadena, California. 
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Analysis of Variance, 130-MRS-024. Bur- 
roughs 220, December 1960 


Produces the sums of squares, degrees of 
freedom, mean squares and F ratios necessary 
for a complete analysis of variance. The 
number of levels at which factors occur may 
vary from factor to factor and replications 
are accommodated. The output can be 
produced on the supervisory printer or 
CARDATRON printer. 

Flow charts, timings, sample problems, 
and a program listing are included in the 
Technical Bulletin. 

Limitation: 

Any (fixed-effects, components of vari- 
ance, or mixed models) completely crossed 
design can be handled if the storage 
requirement determined by the following 
formula is satisfied: 


ont + Ti(Lfi] + 1) + 868 


i=1 


where nm = number of factors and 
L{t] = number of levels of factor i. 
Replication is treated as the last factor; 
hence, if there is no replication L{n] = 1. 


Burroughs Corporation, Electro Data Divi- 
sion, Applied Mathematics Section, 460 
Sierra Madre Villa, Pasadena, California. 


Parallel Probit Analysis “PAP”, IBM 704, 
32K, FORTRAN II 


Fits a linear regression line to quantal 
dose-response data according to the maximum 
likelihood method in Finney. The program 
will compute a parallel and/or a non-parallel 
(regular) probit analysis depending upon 
control cards and the relationships of the 
various probit lines to each other as follows: 


1. Try a parallel analysis and if the n lines 
are parallel, output and return to reader 
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to process more data. If the n lines are 
not parallel, do a regular analysis, out- 
put and return to reader for more data. 
Limitations are: Maximum number of 
lines = 30; Maximum number of points 
per line = 75. 

. Do a regular analysis only, output and 
return to reader. 

. If the data exceeds the above limitations 
for parallel analysis, the symbolic deck 
can be reassembled unless the combi- 
nation of lines and points per line exceeds 
memory. 

. If the data exceed the above limitations 
for regular analysis, a program strictly 
for regular analysis (PBT) is available 
which processes a single line at a time 
and can be reassembled to fit an 8, 16, 
or 32 K. With this program, regular 
probits can be run with n lines per group 
or singly. 


The process used is least squares iterative 
and its convergence is dependent upon the 
simultaneous convergence of the two para- 
meters (A and B of Y = A + BX) of the 
probit regression line. The convergence 
criterion, epsilon, is 10-7 but this can be 
changed on the input cards if so desired. 
Output occurs when: 1) Convergence of the 
process has occurred, or 2) Thirty iterations 
have occurred, unless convergence occurs 
first. 

Heterogeneity is tested by the Chi-Square 
test and if found to be present is used to 
enlarge the variances. The significance level 
desired can be varied at will, since the 
necessary Chi-square, F and ¢ values required 
are entered as input to the program. 

Fiducial limits are per Finney except when 
G is greater than 1, then they are the antilogs 
of m + t8m. 

Additional features are: 


1. Original Y’s and the B and A of the 
unweighted least squares first guess 
line are output for all sets of data. 

. If the lines tested are parallel, relative 
potencies for all combinations of 
included groups, including standard 
errors and XX% confidence limits, 
are output. 

. The following controlled by sense 
switch option: 

a) Use of either X (no transforma- 
tion) or Ln X (transformation). 


b) Weights equal to 1 or weights 
normal and equal to Z?/PQ. 

c) For regular probits only and 
should be used with a single line 
at a time: 

a3 Byun and Ajit 

2.B i and A; 

3. Y’s and Y’s — 5 

4. Calculated P’s, weights and 
working probits 

5. Iteration number. 


A sample problem is available. Reference: 
Finney, Chapter 5, section 20. University of 
California, Los Alamos Laboratory, P. O. 
Box 1663, Los Alamos, New Mexico. 


Periodic Regression, BIMD 21, IBM 709, 
October 1960 


Performs periodic or harmonic regression 
analysis using a regression function of the 
form: 


Y = ao + 2Z[a; cos (ic) + 0b; sin (ic)] 
“1 (nth harmonic). 


Periodic regressions will be computed up 
to the harmonic (n) specified in advance, 
but may stop short of this if a “good fit” is 
achieved with fewer terms. After each 
harmonic an ANOVA table is computed. 
For analysis of covariance, the number of 
harmonics is the same as that determined 
by the direct analysis of the dependent 
variable. 

Limitations: 


1. Highest harmonic allowed is 6. 

2. Maximum number of sub-intervals of 
time is 400. 

3. For analysis of covariance, only a 
single covariate is allowed at one 
time, however there is no limit to the 
number of covariates that can be 
processed singly. 

. For ANOVA output, an equal number 
of replications (between 2 and 20) 
must be present at each sub-interval 
of time. 


The statistical procedure and a sample out- 


put are available. Prof. W. J. Dixon, 
U. C. L. A., Division of Biostatistics, Dept. 
of Preventive Medicine and Public Health, 
Los Angeles 24, California. 
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Multiple Regression and Correlation Analysis, 
125-MRS 117, Burroughs 220, October 1960 


Method used is that of least squares. The 
basic equations were taken from Mood. 
A Crout reduction was used to invert the 
matrix and solve the normal equations. It is 
possible to obtain a comprehensive output 
and to have more than one dependent 
variable. Provision has been made for 
transformation of variables. 

As an option it is possible to force the 
regression plane through the origin. All 
commonly accepted significance tests are 
available, along with residuals based on each 
dependent variable. The means and variances 
of all variables are printed out. 

Provision has been made for entering 
hypothetical values of regression coefficients 
and the variance of the fit for each dependent 
variable. An option will print various 
matrices and their inverses. 

Limitations: 


Handles a maximum of 34 independent 
and 10 dependent variables, but not 
both simultaneously. 


A complete technical bulletin is available 
including sample programs, flow charts, 
program listings and running time estimates. 
Burroughs Corporation, Electro Data Divi- 
sion, Applied Mathematics Section, 460 
Sierra Madre Villa, Pasadena, Calif. 


Test of the Difference Between Two Sample 
Means for Independent Samples, F6-104, 
LGP-30 


Determines whether the difference between 
means of two independent samples is statis- 
tically significant. This subroutine is not 
restricted to samples of equal size, nor to the 
case of equal population variances for the 
samples involved. 

Times, a program listing, a sample pro- 
gram, and flow charts are available. Pool, 
1532 N. Cahuenga Blvd., Los Angeles, Calif. 


C. D. F. of the largest Characteristic Root 
IBM 650, (Dopsir) 


This routine evaluates the cumulative 
distribution function (standard form) of the 
largest characteristic root of certain matrices, 
i.e., it obtains Pr {0, < z|m, n} where @, is 
the largest root. The computations (done in 
double-precision, floating-point arithmetic) 
are based on expressions obtained from Roy’s 
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recursion formulae, normalized such that 
ultimate evaluation is in terms of incomplete 
beta ratios and the binomial probability 
distribution. The distribution is used to test 
several hypotheses occurring in multivariate 
analysis. 
Approximate computing time for each z: 
2(m + n + 5s) seconds. 
Limitations: 
Integral values of s, m, n, for 
s = 2(1)6 
m = 0(1) 10 
n>0 
0o<x<l 


University of North Carolina, Director 
Institute of Statistics, Chapel Hill, N. C. 


Tukey Spectrum Estimation, 574, IBM 704 
(4 tapes), December 1958 


Obtains the spectrum of one time series or 
the spectra, co-spectrum, and quadrature 
spectrum of two simultaneous time series. 
Also, for the case of a single time series, the 
autocorrelation is computed; for two simul- 
taneous time series, the auto and cross 
correlations are computed. For ease of 
plotting, the logs of the spectra are tabulated. 

In the case of two records, the coherence 
at each frequency between the records is 
also calculated. The phase lead (for each 
frequency) of the second time series over the 
first is calculated. The output contains 
provision for applying recording instrument 
frequency responses (amplitude and phase) 
to these. 

Limitation: Number of spectrum estimates 
< 511. 

Reference: J. W. Tukey, The Sampling 
Theory of Power Spectrum Estimates, 
Symposium on applications of auto correla- 
tion analysis to physical problems, Woods 
Hole, Massachusetts. 

SHARE Distribution Agency, Mr. Donald 
Cashman, International Business Machine 
Corp., 590 Madison Ave., New York 22, N. Y. 


Negative Binomial Probability Distribution, 
IBM 704 or 709 


The- program operates in two modes: 
1) input of a single mean and variance to 
output a single table, and 2) input of a 
single mean and initial, incremental, and 
final variances to output a series of tables. 
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Unlimited input in a chosen mode is restricted 
to one card (record) per input. 

Output tables include a heading, page 
number, associated mean and variance, 
the numerical value of the variables Q, 
x, 1 — 2jP;, Po, and four successive prob- 
abilities across each line. Each supplementary 
page of a table has its own page number and 
a modified heading. Up to 1000 probabilities 
can be computed and printed in floating 
point mode, from Po to Po», with a numerical 
threshold chosen by the user. 

Copies of the program, with operating 
instructions, will be provided upon request. 
M. Hershkowitz, Logistics Research Project, 
George Washington University, 707 22nd 
Street, N. W., Washington 7, D. C. 


Bivariate Normal Probability Evaluation, 
BVN, SHARE C3 LAFBVN, IBM 704 
FORTRAN II, August 1961 


The program computes the probability 
Pr(U > a, V > b; r) where U and V are 
random variables having a bivariate normal 
distribution with means pu, and y,, standard 
deviations o, and o,, and correlation r. 
Prepared as a Fortran II function sub- 
routine, the program can be compiled on any 
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machine capable of handling FORTRAN- 
type codes. It requires that the parameters 
and the constants a and b be available in 
storage. The method of evaluation is Simp- 
son’s rule applied to a single-variable integral 
obtained by applying a transformation to 
the usual two-variable integral. The program 
has been checked out by comparing its 
results with the Tables of the Bivariate Normal 
Distribution Function and Related Functions, 
Applied Mathematics Series 50, U. S. Govern- 
ment Printing Office, Washington, D. C., 
Issued June 15, 1959. 

The subroutine requires the standard 
FORTRAN II subroutines ASINF, COSF, 
EXPF, EXP(2, and SINF). In addition, it 
uses a special written error function evalua- 
tion subroutine, ERR169 (SHARE identi- 
fication: C3 LAFERR1), which is also written 
in FORTRAN II and can be used as a 
separate subroutine for other purposes. 
Because each is basically a simple routine, 
flow diagrams have not been prepared for 
these programs. Detailed write-ups and 
source and binary cards are available through 
SHARE Distribution Agency, Mr. Donald 
Cashman, IBM, 590 Madison Ave., New 
York 22, N. Y. 
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NOTICES 


New DEPARTMENT oF Statistics at S M U 


A new department of Mathematical and Experimental Statistics has been 
formed at Southern Methodist University to serve the North Texas area in the 
field of statistics. Course work will be offered leading to the degree of Master 
of Science. Individual courses may also be used for credit on mathematics and 
other degrees, both undergraduate and graduate. Several graduate assistantships 
are available carrying generous stipends. Application should be made to the 
Department. Staff: Paul D. Minton, Ph.D, and Vanamamalai Seshadri, Ph.D. 


Fioriwa State University EXPANDED PrRoGRAM IN STATISTICS 


Beginning in September 1962, the Department of Statistics at the Florida 
State University will expand its graduate program to study and research leading 
to the Doctor of Philosophy degree in statistics. The success of the present 
program, currently limited to study to the Master of Science degree in statistics, 
has led to the approval of the new advanced training. The curriculum will be 
modified and expanded to include advanced work in statistical inference and 
decision theory, stochastic processes, advanced probability, multivariate analy- 
sis, operations research, special topics in biometry, theory of general linear 
hypotheses, non-parametric statistics and sequential analysis. Advanced courses 
will be offered in alternate years. The program leading to the Master of Science 
degree will be modified to permit both the thesis and non-thesis types of programs. 

The new program will be assisted through the recent addition to the staff 
of Dr. Vincent Hodgson from the London School of Economics and Political 
Science. Dr. Hodgson joins Drs. Ralph A. Bradley, Frank Wilcoxon, Richard 
G. Cornell and S. K. Katti in the Department of Statistics. Additional faculty 
appointments within the next two years are anticipated. 

Facilities have been greatly improved through completion of a new mathe- 
matical sciences building in which instructional, laboratory and office space was 
designed for use of the faculty and graduate students in statistics. The new 
building also houses an IBM 709 computer for research use. 

The Department has a limited number of teaching and research assistant- 
ships available. Proposals for three-year graduate fellowships are pending and 
such fellowships should be available for graduate students entering the new 
program in September. Interested students are invited to write to Dr. R. A. 
Bradley, Department of Statistics, the Florida State University, Tallahassee, 
Florida for further information. 
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Errata 


““Some New Three Level Designs for the Study of 
Quantitative Variables” 


G. E. P. Box anp D. W. Bennxen, TecHNomertnics, Vol. 2, No. 4, November, 1960 


Correct Table 5a, pg. 465, third formula from the bottom to 
Cov (bob:;) = —[1/snolo’. 
Table 5c pg 466 should read for 


Design 1, B = 3/16 = and not 1/4 
2,B = 5/48 = and not 1/8 
3,B = 7/96 and not 1/12 
5,B = 1/18 and not 1/16 
8, B = 23/1200 and not 1/48. 


The first line below Table 6 on pg 467 should read: B = 5/48 and not B = 1/8. 
The second formula on pg 468 should read 


bi, = (5/48)(1033.6) — (1/48)(3061.6) — (1/2)(90.6) = —1.416. 
The second Analysis of Variance Table in section 6.2 on pg 468 should read 


8.8. d.f. m.s. 


Residual } Replicated center points 21.14 10.58 
Remainder 105.57 10.56 


126.71 
The third formula in section 6.4 on pg 469 should read 


} ae 
S.E. (b;,) — 2.118 +75 — 63. 


Line 9 on pg 464 should read: 
This method is used where the block size s > 2 and 7s employed for designs. . . . 


“Tables of Tolerance-Limit Factors for Normal Distributions” 


AtFrep WEISSBERG AND GLENN H. Beatty, TecHomertnrics, Vou. 2, No. 4, 1960 


Page 484, 3rd equation, upper limit of integral: ““N’”? + r” should read 
“ne + r” 

Page 494, 7th column of tabulation: “7 — y’” should read “7 — y’” The quantity 
tabulated is 7 — vy. 
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ERRATA 577 


Page 495, 10th line from bottom: “‘s = unbiased sample estimate of population 
standard deviation” should read “‘s’ = unbiased sample estimate of population 
variance” 

Page 496, 3rd line from bottom: “interaction mean square” should read 
“residual mean square’’ The residual mean square was used to obtain the variance 
ratio 8.83 in the tabulation on page 496. 

Page 497, 2nd line: “interaction mean square’’ should read “residual mean 
square” The residual mean square was used to obtain the variance ratio 2.04 
in the tabulation on page 496. 

Pages 496-498: In the analysis of variance example it is implied that joint 
tolerance limits for all nine cell means can be computed from a single sample. 
Similarly, in the regression analysis example it is implied that tolerance-limit 
curves, based on a single sample, are applicable to predictions on y corresponding 
to more than one choice of x. Joint tolerance-limit intervals such as these require 


a complicated adjustment for the consequence effect on the confidence 
coefficient 7. 
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